10 KiB
audio.datasets
-
https://towardsdatascience.com/a-data-lakes-worth-of-audio-datasets-b45b88cd4ad
-
https://www.sciencedirect.com/science/article/pii/S2352340922001421
-
https://towardsdatascience.com/a-data-lakes-worth-of-audio-datasets-b45b88cd4ad
-
https://towardsdatascience.com/40-open-source-audio-datasets-for-ml-59dc39d48f06
Body / medical datasets
Articles
digestive
https://research.google.com/audioset/ontology/digestive_1.html
cardio
An Open Access Database for the Evaluation of Heart Sound Algorithms
The Michigan Heart Sound and Murmur database (MHSDB) was provided by the University of Michigan Health System. It includes only 23 heart sound recordings with a total of time length of 1496.8 s and is available from http://www.med.umich.edu/lrc/psb/heartsounds/index.htm
The PASCAL database comprises 176 recordings for heart sound segmentation and 656 recordings for heart sound classification. Although the number of the recordings is relatively large, the recordings have the limited time length from 1 s to 30 s. They also have a limited frequency range below 195 Hz due to the applied low-pass filter, which removes many of the useful heart sound components for clinical diagnosis. It is available from http://www.peterjbentlev.com/heartchallenge
The Cardiac Auscultation of Heart Murmurs database is provided by eGeneral Medical Inc., includes 64 recordings. It is not open and requires payment for access from: http://www.egeneralmedical.com/listohearmur.html
unsorted cardio
- https://paperswithcode.com/dataset/physionet-challenge-2016
- https://www.kaggle.com/datasets/kinguistics/heartbeat-sounds
- https://physionet.org/content/fetalheartsounddata/
- https://physionet.org/content/sufhsdb/
- https://physionet.org/content/circor-heart-sound
Respiratory
A Progressively Expanded Database for Automated Lung Sound Analysis: An Update
ICBHI Respiratory Sound Database (The Respiratory Sound database - ICBHI 2017 Challenge)
- https://paperswithcode.com/dataset/icbhi-respiratory-sound-database
- https://bhichallenge.med.auth.gr/sites/default/files/ICBHI_final_database/ICBHI_final_database.zip audio
The database consists of a total of 5.5 hours of recordings containing 6898 respiratory cycles, of which 1864 contain crackles, 886 contain wheezes, and 506 contain both crackles and wheezes, in 920 annotated audio samples from 126 subjects.
unsorted lungs / respiratory
- https://paperswithcode.com/dataset/respiratory-and-drug-actuation-dataset
- https://paperswithcode.com/dataset/dicova
- https://paperswithcode.com/dataset/coughvid
- https://www.kaggle.com/datasets/vbookshelf/respiratory-sound-database
fat tissue
- https://physionet.org/content/maternal-ultrasound-nutrition/
- #knowledge-wall
speech / mouth / articulation
KSoF (The Kassel State of Fluency Dataset – A Therapy Centered Dataset of Stuttering)
- https://paperswithcode.com/dataset/ksof
- #knowledge-wall
RWCP-SSD-Onomatopoeia
- https://paperswithcode.com/dataset/rwcp-ssd-onomatopoeia
- no sound, you need to register http://research.nii.ac.jp/src/en/register.html #knowledge-wall
RWCP-SSD-Onomatopoeia is a dataset consisting of 155,568 onomatopoeic words paired with audio samples for environmental sound synthesis
- Words that imitate the sound they describe
others
- https://research.google.com/audioset/ontology/hubbub_speech_noise_speech_babble_2.html #methodology
- Vocal Imitation Set v1.1.3 : Thousands of vocal imitations of hundreds of sounds from the AudioSet ontologyhttps://zenodo.org/record/1340763#.Xlj1By2ZN24
Stomach
Enviroment
FSDnoisy18k
The FSDnoisy18k dataset is an open dataset containing 42.5 hours of audio across 20 sound event classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. The audio content is taken from Freesound, and the dataset was curated using the Freesound Annotator. The noisy set of FSDnoisy18k consists of 15,813 audio clips (38.8h), and the test set consists of 947 audio clips (1.4h) with correct labels. The dataset features two main types of label noise: in-vocabulary (IV) and out-of-vocabulary (OOV). IV applies when, given an observed label that is incorrect or incomplete, the true or missing label is part of the target class set. Analogously, OOV means that the true or missing label is not covered by those 20 classes.
STARSS22 (Sony-TAu Realistic Spatial Soundscapes 2022)
The Sony-TAu Realistic Spatial Soundscapes 2022(STARSS22) dataset consists of recordings of real scenes captured with high channel-count spherical microphone array (SMA). The recordings are conducted from two different teams at two different sites, Tampere University in Tammere, Finland, and Sony facilities in Tokyo, Japan. Recordings at both sites share the same capturing and annotation process, and a similar organization. They are organized in sessions, corresponding to distinct rooms, human participants, and sound making props with a few exceptions.
ARCA23K
ARCA23K is a dataset of labelled sound events created to investigate real-world label noise. It contains 23,727 audio clips originating from Freesound, and each clip belongs to one of 70 classes taken from the AudioSet ontology. The dataset was created using an entirely automated process with no manual verification of the data. For this reason, many clips are expected to be labelled incorrectly.
ADVANCE (AuDio Visual Aerial sceNe reCognition datasEt)
ESC50 (ESC: Dataset for Environmental Sound Classification)
The dataset consists of 5-second-long recordings organized into 50 semantical classes (with 40 examples per class) loosely arranged into 5 major categories.
SoundingEarth
SoundingEarth consists of co-located aerial imagery and audio samples all around the world.
City
UrbanSound8k
Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy. All excerpts are taken from field recordings uploaded to www.freesound.org.
URBAN-SED
URBAN-SED is a dataset of 10,000 soundscapes with sound event annotations generated using the scraper library. The dataset includes 10,000 soundscapes, totals almost 30 hours and includes close to 50,000 annotated sound events.
Room / home
#unsorted
- https://paperswithcode.com/dataset/meshrir
- https://www.academia.edu/89769760/A_sound_database_for_health_smart_home
- https://archive.org/details/chime-home
- https://www.semanticscholar.org/paper/The-CHiME-corpus%3A-a-resource-and-a-challenge-for-in-Christensen-Barker/7e6acdbbe3b5512cb3bb220c7083a222c97ef136
Nature
Datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae)
Warblr
Warblr is a dataset for the acoustic detection of birds. The dataset comes from a UK bird-sound crowdsourcing research spinout called Warblr. From this initiative the authors collected over 10,000 ten-second smartphone audio recordings from around the UK. The audio totals around 28 hours duration.
Other
Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection (MIMII)
https://paperswithcode.com/dataset/mimii
is a sound dataset of industrial machine sounds.
ToyADMOS2
ToyADMOS2 is a dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions.
FSD50K (Freesound Database 50K)
Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra. It consists mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more.
USM-SED
USM-SED is a dataset for polyphonic sound event detection in urban sound monitoring use-cases. Based on isolated sounds taken from the FSD50k dataset, 20,000 polyphonic soundscapes are synthesized with sounds being randomly positioned in the stereo panorama using different loudness levels.
Libraries
Adjacent
- incubation.audio.sample.managment
- incubation.concepts.DatabaseArt
- incubation.ai.audio
- incubation.audio.synthesis.concatenative
- concepts.archives.art
- concepts.Digital Asset Managment
- incubation.tools.gis
Methodologies
- schaeffer
- google - magenta
- ...