Dataset |
Type |
Size |
Metadata |
Access |
GigaMIDI |
MIDI |
1,437,304 files |
The GigaMIDI dataset is a comprehensive collection of over 1.43 million MIDI files, encompassing 5.3 million tracks and 1.8 billion notes, designed to advance research in artificial intelligence and music computing. Unlike traditional audio datasets, GigaMIDI focuses on symbolic music data, offering unique insights into music generation and transcription. This dataset includes annotations for loops and metadata for detecting expressive performances, allowing for a nuanced understanding of human-like interpretation in music. To identify expressive tracks, GigaMIDI introduces a novel heuristic called the note onset median metric level (NOMML), which has demonstrated 99.5% accuracy in distinguishing expressive performances, with 31% of the tracks identified as expressive. Additionally, GigaMIDI addresses the challenge of detecting musical loops, especially when expressive timing variations are present, by marking loops in non-expressive tracks, resulting in the identification of 7 million loops. The dataset is available for research purposes on the Hugging Face hub, providing a valuable resource for advancing AI-driven music research. |
Link |
MetaMIDI |
MIDI |
436,631 files |
Scraped artist + title metadata for 221,504 MIDIs, Scraped genre metadata for 143,868 MIDIs, Audio-MIDI matching procedure, which produced 10,796,557 audio-MIDI matches linking 237,236 MIDIs, including 168,032 MIDIs matched to MusicBrainz IDs via the Spotify/MusicBrainz linking procedure. |
Link
|
Drum Space |
MIDI |
33,000 files |
Unique drum tracks and non-expressive synthetic data generated using neural networks offer 2-D latent space representation using the t-SNE algorithm. |
Link
|
Emo-Soundscapes |
Audio |
1,213 audio clips |
Ground truth annotations of perceived emotion in 1213 soundscape recordings using a crowdsourcing listening experiment, where 1182 annotators from 74 different countries rank the audio clips according to the perceived valence/arousal. |
Link
|
Groove MIDI Dataset (GMD) |
MIDI & Audio |
1,150 MIDI files and over 22,000 measures
|
Drummer IDs, drumming style, distinction between drum beats and fills, expressively performed MIDI tracks. |
Link
|
MAESTRO |
MIDI & Audio |
1,276 MIDI files and corresponding audio waveforms
|
200 hours of virtuosic piano expressive performances captured with fine alignment (~3 ms) between note labels and audio waveforms, including composer IDs available. |
Link
|
AbaSynthphony MIDI Pack V001 |
MIDI |
50,000 files
|
Melodic tracks with fixed velocity levels.
|
Link
|
AbaSynthphony MIDI Pack V002 Drum |
MIDI |
50,000 files
|
Drum tracks with fixed velocity levels.
|
Link
|
AbaSynthphony MIDI Pack V003 Dance Music Drum |
MIDI |
50,000 files |
Dance music drum tracks with fixed velocity levels.
|
Link
|
ASAP (Aligned Scores and Performances |
MIDI |
1,290 files |
A dataset of 222 digital musical scores aligned with 1,068 Western Classical piano music performances (more than 92 hours). This data is originally based on the MAESTRO dataset.
|
Link
|
IsoVAT |
MIDI & Audio |
90 clips in total |
Affective music composition with different levels (low, mid, and high) of Valence, Arousal and Tension.
|
Link
|
ADC2004 |
Audio |
20 excerpts |
This dataset includes information on the predominant pitch by incorporating the polyphonic melody extraction technique.
|
Link
|
AMG1608 |
Audio |
1,608 files |
AMG1608 is a dataset for music emotion analysis. It contains frame-level acoustic features extracted from 1608 30-second music clips and corresponding valence-arousal (VA) annotations provided by 665 subjects.
|
Link
|
APL |
Audio |
620 segments |
One month's worth of practice by one pianist. Tracks are generated automatically using a four-second silence gate to stop recording and a simple threshold to begin recording. Filenames are generated using the date. These tracks were annotated by the performer in ten-second intervals.
|
Link
|
artist20 |
Audio |
1,413 songs |
This is a database of six albums by each of 20 artists, making a total of 1,413 tracks. It grew out of our work in artist identification, where we identified 18 artists with five or more albums in our uspop2002 dataset. This data was used to train artists in identification tasks, with albums disjointed between training and testing. There were, however, a number of issues with that data, including repeated tracks, live recordings, and others.
|
Link
|
DadaGP |
Symbolic (GuitarPro & MIDI) |
26,181 songs |
This is a dataset dedicated to guitar playing, which mostly consists of metal and rock style of music corpus. DadaGP is a dataset of 26,181 GuitarPro songs in 739 genres, converted to a token sequence format suitable for generative language models like GPT2, TransformerXL, etc. It includes an encoder/decoder (v1.1) that converts gp3, gp4, gp5 files to/from this token format.
|
Link
|
GiantMIDI-Piano |
MIDI |
10,855 files |
GiantMIDI-Piano is a classical piano MIDI dataset containing 10,855 MIDI files of 2,786 composers. The curated subset by constraining composer surnames contains 7,236 MIDI files of 1,787 composers. GiantMIDI-Piano is transcribed from live recordings with a high-resolution piano transcription system.
|
Link
|
ATEPP |
MIDI |
11,674 files |
ATEPP is a dataset of expressive piano performances by virtuoso pianists. The dataset contains 11674 performances (~1000 hours) by 49 pianists and covers 1580 1595 movements by 25 composers. All of the MIDI files in the dataset come from the piano transcription of existing audio recordings of piano performances. Scores in MusicXML format are also available for around half of the tracks. The dataset is organized and aligned by compositions and movements for comparative studies.
|
Link
|
bach10 |
Audio & MIDI |
10 chorales |
This dataset consists of Bach's chorales in audio and corresponding MIDI. This is useful for music analysis, specifically for multi-pitch estimation & tracking and audio-score alignment & source separation research.
|
Link
|
ballroom |
Audio |
698 excerpts |
This data contains eight traditional dance music style labels (Cha Cha, Jive, Quickstep, Rumba, Samba, Tango, Viennese Waltz and Slow Waltz), including tempo annotations.
|
Link
|
beatboxset1 |
Audio |
14 clips |
This dataset contains beatboxing (vocal percussion) recordings from various
contributors who recorded the clips themselves under various conditions. The clips
were provided by users of the website http://www.humanbeatbox.com/ and are
identified by the names they use on that forum. Each clip is from a different
contributor.
|
Link
|
C224a |
Audio |
224 artists |
This is a collection of 224 artists categorized into 14 genres with a uniform genre distribution, useful for artist classification tasks.
|
Link
|
C3ka |
Audio |
3,000 artists |
This is a collection of 3,000 artists, corresponding to the top-ranked last.fm artists (filtered by occurrence in allmusic.com). The music style assignment originates from allmusic.com (18 distinct genres) and is useful for artist classification tasks.
|
Link
|
C49ka/C111ka |
Audio |
48800/110588 artists |
These are two artist collections used for microblog indexing experiments. C111ka contains a list of 110,588 artists (without genre information). C49ka comprises 48,800 artists, for which genre information is available as well. This metadata is useful for artist classification tasks.
|
Link
|
CAL10k-CAL500 |
Audio |
11,365 songs |
This dataset contains 11,365 full-length songs from over 4,500 artists and over 18 musical genres. It also contains semantic tags harvested from the Pandora website, including 475 acoustic tags and 153 genre (and sub-genre) tags. Human listeners and musical experts assigned these tag annotations to the songs.
|
Link
|
Center for Computer Assisted Research in the Humanities (CCARH) |
Symbolic (Musedata, Themefinder, Humdrum and Kern) |
Academic Resources |
This is an academic teaching resource for teaching symbolic music and its analysis. It contains a small amount of symbolic music data to provide examples.
|
Link
|
CCMixter |
Audio |
50 mixes |
This is the dataset used for audio source separation, including the vocal and background music tracks.
|
Link
|
Chopin22 |
Audio & MIDI |
44 files |
In 1999, 22 highly skilled pianists from the Vienna area performed these excerpts from pieces by Chopin on the same Bösendorfer computer-controlled grand piano. It contains both audio recordings and aligned MIDI files for the recordings.
|
Link
|
CMMSD |
Audio |
36 excerpts |
A musical data set for note-level segmentation of monophonic music is presented. It contains 36 excerpts from commercial recordings of monophonic classical western music and features the instrument groups strings, woodwind and brass. The excerpts are self-contained phrases with a mean length of 17.97 seconds and an average of 20 notes.
|
Link
|
Coidach |
Audio |
26,420 songs |
Codaich dataset consists of 26,420 carefully labelled MP3 encodings of music, although the current working version is much larger. Efforts were made to achieve as stylistically diverse a collection as possible, and this collection includes music from 55 different music styles, which are distributed among the coarse categories of popular, world, classical and jazz.
|
Link
|
covers80 |
Audio |
80 song pairs |
This is a collection of 80 songs performed by two artists respectively, so 160 songs in total. This data is to research the automatic detection of "cover songs," i.e. alternative performances of the same basic musical piece by different artists, typically with large stylistic and/or harmonic changes.
|
Link
|
DAMP |
Audio |
34,000 monophonic recordings |
This dataset is a collection of karaoke performances. Stanford Digital Archive of Mobile Performances, a repository of geo-tagged mobile performances to facilitate the research of amateur practices.
|
Link
|
DEAM |
Audio |
1,802 excerpts |
DEAM dataset consists of 1802 excerpts and full songs annotated with Valence and Arousal values both continuously (per-second) and over the whole song. The detailed description of the dataset is available in the Manual. The metadata describing the audio excerpts (their duration, genre, and folksonomy tags) are included in the dataset.
|
Link
|
DEAP |
Audio |
120 music video excerpts |
DEAP dataset includes the ratings from an online self-assessment where 120 one-minute extracts of music videos were each rated by 14-16 volunteers based on arousal, valence and dominance. The dataset also contains the participant ratings, physiological recordings, and face video of an experiment in which 32 volunteers watched a subset of 40 of the above music videos. EEG and physiological signals were recorded, and each participant also rated the videos as shown above. For 22 participants, a frontal face video was also recorded.
|
Link
|
DREANSS |
Audio |
18 excerpts |
The full name of this dataset is DRum Event ANnotations for Source Separation (DREANSS). The purpose of the annotations is to help develop research in source separation methods for polyphonic audio music mixtures containing drums. The dataset owners provide a dataset that contains annotations for 22 excerpts of songs taken from different multi-track audio datasets publicly available for research purposes. These multi-track excerpts range from several genres including Rock, Reggae, electronic, Indie and Metal. The excerpts have an average duration of 10 seconds. This annotations dataset is divided into four folders, each of which contains the annotations of a given audio source separation dataset.
|
Link
|
DrumPt |
Audio |
30 annotated tracks |
The dataset is structured as the original ENST dataset. There are three folders (namely, drummer_1, drummer_2 and drummer_3), each folder contains the annotations for tracks that have accompaniments (songs with prefix of MIN or MID). The technique names and their corresponding indices are:
drag: 1
roll: 2
flam: 3
There are 30 annotated tracks containing 182 individual events {109 rolls, 26 flams, and 47 drags}. Each event is roughly 250~450 ms in duration. Currently, the dataset only contains annotation on the snare drum channel. In the future, annotations of more techniques and drums could be added.
|
Link
|
emoMusic |
Audio |
744 songs |
1000 songs have been selected from Free Music Archive (FMA). The excerpts that were annotated are available in the same package, with song IDs 1 to 1000. We identified some redundancies, which reduced the dataset to 744 songs. The dataset is split between the development set (619 songs) and the evaluation set (125 songs). The extracted 45-second excerpts are all re-encoded to have the same sampling frequency, i.e., 44100Hz. Full songs are available and are also provided in the same package. The 45-second excerpts are extracted from random (uniformly distributed) starting points in a given song. The continuous annotations were collected at a sampling rate that varied according to browsers and computer capabilities. Therefore, we resampled the annotations and generated the averaged annotations with 2Hz sampling rate. In addition to the average, we will provide the standard deviation of the annotations so that you can have an idea about the margin of error. The continuous annotations are between -1 and +1 and excludes the first 15 seconds due to instability of the annotations at the start of the clips. To combine the annotations collected for the whole song on a nine-point scale, we report the average and the standard deviation of the ratings ranging from one to nine. A detailed explanation of data collection methods and baseline results, are provided in our CrowdMM paper. |
Link
|
emoMusic |
Audio |
744 songs |
1000 songs have been selected from Free Music Archive (FMA). The excerpts that were annotated are available in the same package, with song IDs 1 to 1000. We identified some redundancies, which reduced the dataset to 744 songs. The dataset is split between the development set (619 songs) and the evaluation set (125 songs). The extracted 45-second excerpts are all re-encoded to have the same sampling frequency, i.e., 44100Hz. Full songs are available and are also provided in the same package. The 45-second excerpts are extracted from random (uniformly distributed) starting points in a given song. The continuous annotations were collected at a sampling rate that varied according to browsers and computer capabilities. Therefore, we resampled the annotations and generated the averaged annotations with 2Hz sampling rate. In addition to the average, we will provide the standard deviation of the annotations so that you can have an idea about the margin of error. The continuous annotations are between -1 and +1 and excludes the first 15 seconds due to instability of the annotations at the start of the clips. To combine the annotations collected for the whole song on a nine-point scale, we report the average and the standard deviation of the ratings ranging from one to nine. A detailed explanation of data collection methods and baseline results, are provided in our CrowdMM paper. |
Link
|
Emotify |
Audio |
400 excerpts |
The dataset consists of 400 song excerpts (1 minute long) in 4 genres (rock, classical, pop, electronic). The annotations were collected using the GEMS scale (Geneva Emotional Music Scales). Each participant could select three items from the scale (the emotions that he felt strongly listening to this song). Below is the description of the emotional categories as found in the game. The Emotify dataset has no arousal/valence values, but it provides the audio and is annotated with the GEMS. The discrete emotion tags include amazement, solemnity, tenderness, nostalgia, calmness, power, joyful activation, tension, and sadness. |
Link
|
ENST-Drums |
Audio |
318 segments |
The ENST-Drums database is a large and varied research database for automatic drum transcription and processing:
Three professional drummers specialized in different music genres were recorded.
The total duration of audio material recorded per drummer is around 75 minutes.
Each drummer played a drum kit.
Each sequence used either sticks, rods, brushes or mallets to increase the diversity of drum sounds.
The drum kits themselves are varied, ranging from a small, portable kit with two toms and two cymbals, suitable for jazz and Latin music, to a larger rock drum set with four toms and five cymbals.
Each sequence is recorded on 8 audio channels, filmed from two angles, and fully annotated. |
Link
|
Extended Ballroom |
Audio |
4000 excerpts (30s) |
Metadata in the extended ballroom dataset can be summarized as follows:
The root node of the XML contains one genre node for each genre class.
Each genre node contains one song node for each song belonging to that genre class.
Each song node contains several pieces of information:
- album: album name
- artist: artist name
- title: song title
- bpm: tempo in bpm. From v1.1, some wrong/missing tempo annotations were manually corrected. The average was taken in the case of varying tempo over the track.
- hash: md5 hash, used to check the adequation of the audio
- version: indicates that this track is a version repetition of the song with the ID provided
- exact: indicates that this track is an exact duplicate of the song with the ID provided
- time: indicates that this track is a time duplicate of the song with the ID provided
- karaoke: indicates that this track is a karaoke duplicate of the song with the ID provided. (one of the two tracks has a 'voiced' attribute, meaning that the track is the version with voice) |
Link
|
FlaBase |
Audio |
13,311 tracks |
FlaBase (Flamenco Knowledge Base) is the acronym of a new knowledge base of flamenco music. Its ultimate aim is to gather all available online editorial, biographical and musicological information related to flamenco music. Its content is the result of the curation and extraction processes. FlaBase is stored in JSON format and freely available for download. This first release of FlaBase contains information about 1,102 artists, 74 palos (flamenco genres), 2,860 albums, 13,311 tracks, and 771 Andalusian locations. |
Link
|
Algomus |
Audio |
FMA-small: 8,000 tracks of 30s, 8 balanced genres
FMA-medium: 25,000 tracks of 30s, 16 unbalanced genres
FMA-large: 106,574 tracks of 30s, 161 unbalanced genres
FMA-full: 106,574 untrimmed tracks, 161 unbalanced genres |
The Free Music Archive (FMA) dataset is designed especially for music information retrieval researchers, and they provide different versions of the dataset based on the size. FMA is an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. However, the community's growing interest in feature and end-to-end learning is restrained by the limited availability of large audio datasets. The FMA aims to overcome this hurdle by providing 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums arranged in a hierarchical taxonomy of 161 genres. It provides full-length, high-quality audio and pre-computed features, track-level and user-level metadata, tags, and free-form text such as biographies. |
Link
|
Fugues |
Symbolic |
36 pieces |
This is the reference dataset for computational music analysis. It contains a dataset of ground truth structures for fugues. For the fugues, the dataset specifically contains 24 Bach fugues (WTC I, BWV 846-893) + 12 Shostakovich fugues (op.57, 1952), including metadata of S/CS/CS2 patterns, cadences, pedals (1000+ labels). |
Link
|
GiantSteps Key |
Audio |
604 files |
This is the dataset for the automatic evaluation of key estimation algorithms. |
Link
|
GiantSteps Tempo |
Audio |
664 files |
This is the dataset for the automatic evaluation of tempo detection algorithms. |
Link
|
Good-sounds |
Audio |
8750 notes/scales |
This dataset was created in the context of the Pablo project, partially funded by KORG Inc. It contains monophonic recordings of two kinds of exercises: single notes and scales. The recordings were made in the Universitat Pompeu Fabra / Phonos recording studio by 15 different professional musicians, all of them holding a music degree and having some expertise in teaching. 12 different instruments were recorded using one or up to 4 different microphones (depending on the recording session). The whole set of playable semitones is recorded several times with different tonal characteristics for all the instruments. Each note is recorded into a separate mono .flac audio file of 48kHz and 32 bits. The tonal characteristics are explained in both the following section and the related publication. |
Link
|
GuitarSet |
Audio |
360 excerpts |
This is a dataset that provides high-quality guitar recordings alongside rich annotations and metadata.
In particular, by recording guitars using a hexaphonic pickup, we can provide recordings of the individual strings and largely automate the expensive annotation process, therefore providing rich annotation.
The dataset contains recordings of a variety of musical excerpts played on an acoustic guitar, along with time-aligned annotations, including pitch contours, string and fret positions, chords, beats, downbeats, and playing style. |
Link
|
GTZAN |
Audio |
1000 excerpts |
GTZAN is a dataset for music genre classification of audio signals. The dataset consists of 1,000 audio tracks, each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22,050Hz Mono 16-bit audio files in WAV format. The genres are blues, classical, country, disco, hip-hop, jazz, metal, pop, reggae, and rock. |
Link
|
HJDB |
Audio |
235 excerpts |
HJDB dataset is for the evaluation of tempo estimation. |
Link
|
Music Audio Benchmark Dataset |
Audio |
1886 songs |
The Dataset contains 1886 songs, all being encoded in mp3 format. The frequency and bitrate of these files are 44,100 Hz and 128 kb
respectively. It contains music from 9 genre labels: 145 songs for alternative, 120 songs for blues, 113 songs for electronic, 222 songs for folk-country, etc. |
Link
|
IADS-E |
Audio |
935 snippets |
The IADS-E dataset is an expanded version of the second version of the International Affective Digitized Sounds system (IADS-2; Bradley & Lang, 2007a). It consists of 935 digitally recorded natural sounds (including those on the IADS-2) common in daily life, such as babies crying, typing, footsteps, background music, and sound effects. The new sounds in the expanded version were assembled from internet sources and sampled using computer music software or composed by a composer. We divided the sounds in IADS-E (including the IADS-2) into ten semantic categories: animals (N = 54), people (N = 74), nature (N = 70), sounds of daily routines (N = 187), transport (N = 59), electronic sounds (N = 64), sound effects (N = 171), breaking sounds (N = 56), music (N = 170), and scenarios (N = 30). Each sound lasts six seconds and is characterized by Japanese participants according to the affective dimensions of valence, arousal, or dominance/control, according to the Self-Assessment Manikin (SAM; Bradley & Lang, 1994) and on three basic emotion-rating scales (happiness, sadness, and fear). |
Link
|
IDMT-Traffic |
Audio |
17,506 excerpts |
The IDMT-TRAFFIC dataset includes 17,506 2-second-long stereo audio excerpts of recorded vehicle passings and different background sounds alongside streets. The dataset includes recordings from 4 different recording locations, 4 different vehicle types (bus, car, motorcycle, and truck), three different tempo limit areas, and dry and wet weather/road conditions. The direction of movement is annotated as well. Recordings with both high-quality sE8 microphones and medium-quality MEMS microphones are included. |
Link
|
IDMT-SMT-Audio-Effects |
Audio |
55,044 excerpts |
The IDMT-SMT-Audio-Effects database is a large database for automatic detection of audio effects in recordings of electric guitar and bass and related signal processing.
The overall duration of the audio material is approx. 30 hours.
The dataset consists of 55044 WAV files (44.1 kHz, 16bit, mono) with single recorded notes:
20592 monophonic bass notes,
20592 monophonic guitar notes, and
13860 polyphonic guitar sounds.
Overall, 11 different audio effects are incorporated:
feedback delay,
slapback delay,
reverb,
chorus,
flanger,
phaser,
tremolo,
vibrato,
distortion,
overdrive, and
no effect (unprocessed notes/sounds). |
Link
|
IDMT-SMT-Bass |
Audio |
4,300 excerpts |
The IDMT-SMT-Bass database is large for automatic bass transcription and signal processing.
The overall duration of the audio material is approx. 3.6 hours.
The dataset consists of approx—4300 WAV files (44.1 kHz, 24bit) with single recorded notes.
Overall, 10 different bass-related playing techniques, namely five plucking styles
fingerstyle (FS),
picked (PK),
muted (MU),
slap-thumb (ST),
slap-pluck (SP),
and five expression styles
normal (NO),
vibrato (VI),
bending (BE),
harmonics (HA), and
dead-note (DN)
are incorporated. |
Link
|
IDMT-SMT-Bass-Single-Track |
Audio |
17 bass lines excerpts |
The IDMT-SMT-BASS-SINGLE-TRACK dataset comprises 17 bass lines from different music styles.
It is intended as a public evaluation dataset for retrieval of repetitive bass patterns. The pattern length (in seconds) and the beginning of the first pattern appearance (in seconds) are annotated for each bass line.
The patterns are, in general, no exact repetitions but instead contain occasional
pitch and rhythm variations.
Bass transcription includes each note annotated with the score-related parameters onset, offset and pitch.
Spatial transcription/estimation of the fretboard position includes each note annotated with the instrument-related parameters string number and fret number.
Estimation of bass guitar plucking styles includes each note played and annotated with one of the five plucking style classes. |
Link
|
IDMT-SMT-Drums |
Audio |
518 files |
The IDMT-SMT-Drums database is a medium-sized automatic drum transcription and source separation database.
The dataset consists of 608 WAV files (44.1 kHz, Mono, 16bit), and its approximate duration is 2:10 hours.
There are 104 polyphonic drum set recordings (drum loops) containing only the drum instruments kick drum, snare drum and hi-hat. For each drum loop, there are 3 training files for the involved instruments, yielding 312 training files for drum transcription purposes. The recordings are from three different sources:
Real-world acoustic drum sets (RealDrum),
Drum sample libraries (WaveDrum and
Drum synthesizers (TechnoDrum).
The onsets of kick drum, snare drum and hi-hat have been manually annotated for each drum loop. They are provided as XML and SVL files that can be assigned to the corresponding audio recording by their filename. Appropriate annotation file parsers are provided as MATLAB functions together with an example script showing how to import the complete dataset.
The subsets TechnoDrum02 and WaveDrum02 contain 64 drum loops that are delivered together with perfectly isolated single tracks of kick drum, snare drum, and hi-hat in addition to the above-mentioned training files. Mixing the single tracks together yields the mixture drum loops, thus providing 192 reference signals for source separation experiments. |
Link
|
IDMT-SMT-Guitar |
Audio |
4700 note events + 400 monophonic and polyphonic note events |
The IDMT-SMT-GUITAR database is a large database for automatic guitar transcription. Seven different guitars in standard tuning were used with varying pick-up settings and different string measures to ensure a sufficient diversification in the field of electric and acoustic guitars. The recording setup consisted of appropriate audio interfaces, which were directly connected to the guitar output or in one case to a condenser microphone. The recordings are provided in one channel RIFF WAVE format with 44100 Hz sample rate.
The dataset consists of four subsets. The first contains all introduced playing techniques (plucking styles: finger-style, muted, picked; expression styles: normal, bending, slide, vibrato, harmonics, dead-notes) and is provided with a bit depth of 24 Bit. It has been recorded using three different guitars and consists of about 4700 note events with monophonic and polyphonic structure. As a particularity the recorded files contain realistic guitar licks ranging from monophonic to polyphonic instrument tracks.
The second subset of data consists of 400 monophonic and polyphonic note events each played with two different guitars. No expression styles were applied here and each note event was recorded and stored in a separate file with a bit depth of 16 Bit. The parameter annotations for the first and second subset are stored in XML format.
The third subset is made up of five short monophonic and polyphonic guitar recordings. All five pieces have been recorded with the same instrument and no special expression styles were applied. The files are stored with a bit depth of 16 Bit and each file is accompanied by a parameter annotation in XML format.
Additionally, a fourth subset is included, which was created for evaluation purposes in the context of chord recognition and rhythm style estimation tasks. This set contains recordings of 64 short musical pieces grouped by genre. Each piece has been recorded at two different tempi with three different guitars and is provided with a bit depth of 16 Bit. Annotations regarding onset positions, chords, rhythmic pattern length, and texture (monophony/polyphony) are included in various file formats. |
Link
|
iKala |
Audio |
252 excerpts |
The iKala dataset comprises of 252 30-second excerpts sampled from 206 iKala songs (plus 100 hidden excerpts reserved for MIREX). The music accompaniment and the singing voice are recorded at the left and right channels respectively and can be found under the Wavfile directory. In addition, the human-labeled pitch contours and timestamped lyrics can be found under PitchLabel and Lyrics respectively. |
Link
|
Eurovision Song Contest |
Audio |
1735 songs |
The Eurovision Song Contest is a freely-available dataset containing metadata, contest ranking and voting data of 1735 songs that have competed in the Eurovision Song Contests. The upcoming release will also contain audio features.
Every year, the dataset is updated with the contest's results. This release contains the contestant metadata, contest ranking and voting data of 1735 entries that participated in the Eurovision Song Contest from its first occurrence in 1956 until now. The corresponding audio for every song can be streamed through YouTube. |
Link
|
IRMAS |
Audio |
2874 excerpts |
IRMAS is intended to be used for training and testing methods for the automatic recognition of predominant instruments in musical audio. The instruments considered are: cello, clarinet, flute, acoustic guitar, electric guitar, organ, piano, saxophone, trumpet, violin, and human singing voice. This dataset is derived from the one compiled by Ferdinand Fuhrmann in his PhD thesis, with the difference that we provide audio data in stereo format, the annotations in the testing dataset are limited to specific pitched instruments, and there is a different amount and lenght of excerpts. |
Link
|
ISMIR2004Genre |
Audio |
465 excerpts |
This is a collection of audio used for the Genre Identification task of the ISMIR 2004 audio description contest organized by the Music Technology Group (Universitat Pompeu Fabra). The audio for the task was collected from Magnatune, which contains a large amount of music licensed under Creative Commons licenses. The task of the contest was to classify a set of songs into genres, using the genre labels that Magnatune provided in their database. |
Link
|
ISMIR2004Tempo |
Audio |
465 excerpts |
A professional musician placed beat marks on several song excerpts. The ground-truth tempo was computed as the median of the inter-beat intervals. The total number of instances is 465 with the duration of 20-second. |
Link
|
MTD-Jamendo |
Audio |
55,525 files |
MTG-Jamendo Dataset is a new open dataset for music auto-tagging. It is built using music available at Jamendo under Creative Commons licenses and tags provided by content uploaders. The dataset contains over 55,000 full audio tracks with 195 tags from genre, instrument, and mood/theme categories. We provide elaborated data splits for researchers and report the performance of a simple baseline approach on five different sets of tags: genre, instrument, mood/theme, top-50, and overall. |
Link
|
Source alignment and separation test database |
Audio & MIDI |
16,256 mixes with 128 MIDI instruments |
The data consists of randomly generated MIDI files to avoid copyright issues and maintain flexibility. Each file, representing a different MIDI instrument (excluding drums), is 10 seconds long with about 20 notes, varying in duration, pitch, and loudness. Two renderings simulate differences between synthesized scores and real recordings: one with Timidity++ on Linux and one with DirectMusic on Windows XP. Tempo variations were added to mimic real performance timing. The database has two versions: one with all MIDI instruments and one subset of 20 common orchestral and pop instruments, reducing possible mixes for focused testing. The results for the ISMIR 2010 paper, derived from this subset, required extensive computation to ensure accuracy. |
Link |
LabROSA:APT |
Audio & MIDI |
29 excerpts |
This dataset is a small scale which is created for automatic piano transcription research. |
Link |
LabROSA:MIDI |
Audio & MIDI |
8 songs |
This dataset is a small scale which is created for research project: ground-truth transcriptions of real music from force-aligned MIDI syntheses. |
Link |
Lakh MIDI Dataset |
MIDI |
176,581 files |
The Lakh MIDI dataset (LMD) is a collection of 176,581 unique MIDI files, 45,129 of which have been matched and aligned to the Million Song Dataset entries. Its goal is to facilitate large-scale music information retrieval, both symbolic (using the MIDI files alone) and audio content-based (using information extracted from the MIDI files as annotations for the matched audio files). |
Link |
Last.FM 1K User Dataset |
Audio |
992 users |
This dataset contains the user, timestamp, artist, and song as metadata. This data is particularly useful for designing a music recommendation system, as it is useful for analyzing the user's behaviours in the context of music listening. |
Link |
MagnaTagATune Dataset |
Audio |
25,863 clips |
The dataset includes human annotations gathered through Edith Law’s TagATune game. It also contains corresponding sound clips from magnatune.com, encoded in 16 kHz, 32kbps, mono mp3 format, generously contributed by John Buckman, the founder of Magnatune, a favourite label among MIR researchers. Additionally, it features a detailed analysis from The Echo Nest, covering the track's structure and musical content, including rhythm, pitch, and timbre. Finally, all the source code necessary for generating the dataset distribution is provided. |
Link |
MAPS Database |
Audio |
238 pieces |
MAPS (MIDI Aligned Piano Sounds) is a comprehensive piano sound database designed for research on multi-F0 estimation and automatic transcription. It consists of approximately 31 GB of CD-quality recordings in .wav format, obtained using Virtual Piano software and a Yamaha Disklavier. The database includes recordings from nine different pianos and recording conditions, featuring four sound categories: isolated notes and monophonic sounds, random chords, usual chords, and pieces of music. For all sounds, ground truth is provided in both MIDI and text formats, and the audio was generated from this ground truth to ensure annotation accuracy. MAPS is freely available under a Creative Commons license. |
Link |
DALI |
Audio |
5,358 tracks |
We present the DALI dataset, a comprehensive resource for the singing voice community that serves as a reference for synchronized audio, lyrics, and notes. The dataset includes 5,358 full-duration songs, each with time-aligned vocal melody notes and lyrics, organized into four levels of granularity: notes, words, lines, and paragraphs. Additionally, the DALI dataset provides multimodal information for each song, including genre, language, musician details, album covers, and links to video clips. |
Link |
MARD |
Audio |
263,525 reviews |
MARD contains texts and accompanying metadata originally obtained from a much larger dataset of Amazon customer reviews, which have been enriched with music metadata from MusicBrainz, and audio descriptors from AcousticBrainz. MARD amounts to a total of 65,566 albums and 263,525 customer reviews. |
Link |
McGill Billboard Project |
Audio |
1,000 entries |
This release includes annotations and audio features for the first 1,000 entries from a random sample of Billboard chart slots presented at ISMIR 2011, along with an additional 300 entries used for evaluating audio chord estimation in MIREX 2012. The dataset contains annotations and features for 890 slots, covering 740 distinct songs due to sampling overlap. Although training algorithms that assume independent and identically distributed data should account for these duplicates, we will release annotations for the remaining 700 entries over the next couple of years to ensure the availability of unseen data for future evaluations at MIREX and similar events. |
Link |
MedleyDB |
Audio |
196 songs |
MedleyDB is a meticulously curated dataset of annotated, royalty-free multitrack recordings. It was developed primarily to facilitate research on melody extraction, addressing significant limitations present in existing collections. For each composition, the dataset provides melody fundamental frequency (f0) annotations, along with instrument activation labels, to support the evaluation of automatic instrument recognition systems. Additionally, MedleyDB is a valuable resource for research in various domains that necessitate access to individual tracks within a song, including source separation and automatic mixing tasks. |
Link |
Meertens Tunes Collections |
Audio |
7,178 recordings |
With the Meertens Tune Collections (MTC), the Meertens Instituut provides a rich set of collections of musical data for research purposes, such as musicological investigations or music information retrieval tasks. Over the past decades, this data has been collected in the database of Dutch songs. The online interface of the Database of Dutch Songs provides access at the level of individual records through extensive search and browsing functionality. With the MTC, several collections are provided as a whole. |
Link |
MidiDB |
MIDI |
Massive collection of top 40, pop, rock, classic hits, country, and TV themes |
Explore an extensive collection of complimentary MIDI files at MIDIdb.com, a premier MIDI database. This resource offers MIDI files across a wide array of genres, including Top 40, pop, rock, classic hits, country, and television themes. Users can search for MIDI files by title, artist, or genre and download free demo files at any time. Additionally, links to full-length professional MIDI files are available through Hit Trax MIDI Files. These high-quality MIDI files, ideal for singers and bands, provide exceptional backing tracks. Users can enjoy unlimited downloads. |
Link |
Million Musical Tweets Dataset |
Audio |
1,086,808 tweets referring to 133,968 unique tracks by 25,060 different artists |
The data set contains listening histories inferred from microblogs. Each listening event identified via twitter-id and user-id is annotated with temporal (date, time, weekday, timezone), spatial (longitude, latitude, continent, country, county, state, city), and contextual (information on the country) information. In addition, pointers to artists and tracks are provided as a matter of course. Moreover, the data includes references to other music-related platforms (musicbrainz, 7digital, amazon). |
Link |
Million Song Dataset |
Audio |
1,000,000 songs (280 GB), 10,000 songs (1%, 1.8 GB compressed) for a quick test |
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
Its purposes are the following:
1) To encourage research on algorithms that scale to commercial sizes
2) To provide a reference dataset for evaluating research
3) As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's)
4) To help new researchers get started in the MIR field.
The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide. |
Link |
MIR Lab |
Audio |
Over 1,000 files |
This dataset website is managed by MIR lab. It contains data for audio melody extraction, singing voice separation, vocal detection, and it is used in audio melody extraction task of MIREX 2009. It also includes data for query by singing/humming, monophonic pitch tracking, silence/unvoiced/voiced detection, and it is also good for query by tapping, including singing transcription. |
Link |
MODAL |
Audio |
71 snippets |
MODAL is a Musical Onset Database And Library dataset. Modal is a cross-platform library for musical onset detection written in C++ and Python. It is provided here under the terms of the GNU General Public License. It consists of code for several types of Onset Detection Function (ODF), code for real-time and non-real-time onset detection, and a means for comparing the performance of different ODFs. |
Link |
4Q audio emotion dataset |
Audio |
900 clips |
This is a 4-quadrant audio emotion dataset. It contains 900 audio clips, annotated into 4 quadrants, according to Russell's model. |
Link |
Bi-modal (audio and lyrics) emotion dataset |
Audio & lyrics |
133 audio clips and lyrics |
This is a bi-modal (audio and lyrics) emotion dataset. It contains 133 audio clips and lyrics, manually annotated into 4 quadrants, according to Russell's model. |
Link |
Lyrics emotion sentences dataset |
Text |
368 sentences |
This is a sentence-based lyrics emotion dataset for Lyrics Emotion Variation Detection research. It contains 368 sentences manually annotated into 4 quadrants (based on Russell's model). The datset was split into a 129-sentences training dataset and a 239-sentences testing dataset. It also contains an emotion dictionary with 1246 words. |
Link |
Lyrics emotion dataset |
Text |
771 lyrics |
This is the lyrics emotion dataset. It contains two parts: i) a 180-lyrics dataset manually annotated with arousal and valence values (based on Russell's model); ii) a 771-lyrics dataset annotated in 4 quadrants (Russell's model), based on AllMusic tags. |
Link |
Multi-modal MIREX-like emotion dataset |
Audio & Lyrics & MIDI |
903 audio clips, 764 lyrics and 193 MIDI files |
This is a multi-modal MIREX-like emotion dataset. It contains 903 audio clips (30-sec), 764 lyrics and 193 MIDI files. To the best of our knowledge, this is the first emotion dataset containing those 3 sources (audio, lyrics and MIDI). |
Link |
MTG-QBH: Query By Humming dataset |
Audio |
118 queries/481 songs |
The recordings were made by 17 different subjects, 9 female and 8 male, whose musical experience ranged from none at all to amateur musicians. Subjects were presented with a list of songs out of which they were asked to select the ones they knew and sing part of the melody. The subjects were aware that the recordings will be used as queries in an experiment on QBH. There was no restriction as to how much of the melody should be sung nor which part of the melody should be sung, and the subjects were allowed to sing the melody with or without lyrics. The subjects did not listen to the original songs before recording the queries, and the recordings were all sung a capella without any accompaniment nor reference tone. To simulate a realistic QBH scenario, all recordings were done using a basic laptop microphone and no post-processing was applied. The duration of the recordings ranges from 11 to 98 seconds, with an average recording length of 26.8 seconds. |
Link |
Musedata |
Symbolic (Midi, MuseData, Humdrum) |
881 files |
This data contains non-MIDI symbolic formats such as MuseData and Humdrum for music encodings. It is an electronic library of Classical Music scores. |
Link |
Music Mood Rating Dataverse |
Audio |
600 files |
This data contains average ratings of discrete emotion tags with annotations, including valence, arousal, atmosphere, happy, dark, sad, angry, sensual, sentimental. |
Link |
KGRec: Sound and Music Recommendation with Knowledge Graphs |
User data |
Number of items-users interactions: 751,531 (KGRec-music) & Number of items-users interactions: 2,117,698 (KGRec-sound) |
Two different datasets with users, items, implicit feedback interactions between users and items, item tags, and item text descriptions are provided, one for Music Recommendation (KGRec-music), and the other for Sound Recommendation (KGRec-sound). |
Link |
MusiClef 2012 |
Audio |
1355 songs |
The MusiClef 2012 – Multimodal Music Data Set provides editorial metadata, various audio features, user tags, web pages, and expert labels on a set of 1355 popular songs. It was used in the MusiClef 2012 Evaluation Campaign. |
Link |
MusicMicro |
User data |
136,866 users |
The data set contains listening histories inferred from microblogs. Each listening event identified via twitter-id and user-id is annotated with temporal (month and weekday) and spatial (longitude, latitude, country, and city) information. In addition, pointers to artist and track are provided as a matter of course. |
Link |
MusicNet |
Audio |
330 recordings |
MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note in every recording, the instrument that plays each note, and the note's position in the metrical structure of the composition. The labels are acquired from musical scores aligned to recordings by dynamic time warping. The labels are verified by trained musicians; a labeling error rate of 4% has been estimated. The MusicNet labels are offered to the machine learning and music communities as a resource for training models and a common benchmark for comparing results. |
Link |
musiXmatch dataset |
Lyrics |
lyrics for 237,662 tracks |
The MXM dataset provides lyrics for many MSD tracks. The lyrics come in bag-of-words format: each track is described as the word-counts for a dictionary of the top 5,000 words across the set. Although copyright issues prevent us from distributing the full, original lyrics, we hope and believe that this format is for many purposes just as useful, and may be easier to use.
The dataset comes in two text files, describing training and test sets. The split was done according to the split for tagging, see tagging test artists. There are 210,519 training bag-of-words, 27,143 testing ones. We also provide the full list of words with total counts across all tracks so you can measure the relative importance of the top 5,000. |
Link |
The NSynth dataset |
Audio |
305,979 musical notes |
NSynth is an audio dataset containing 305,979 musical notes, each with a unique pitch, timbre, and envelope. For 1,006 instruments from commercial sample libraries, we generated four second, monophonic 16kHz audio snippets, referred to as notes, by ranging over every pitch of a standard MIDI pian o (21-108) as well as five different velocities (25, 50, 75, 100, 127). The note was held for the first three seconds and allowed to decay for the final second. |
Link |
Petrucci Music Library |
N-grams |
Upto 10 gram |
This dataset contains chord progressions of up to four chords length and their counts. The chords represent all simultaneously active notes over all voices of a score. This means that the notes must not have the same onset time in order to appear in the same chord. |
Link |
Phonation Modes Dataset |
Audio |
900 samples of sustained sung vowels |
This is a collection of datasets for training computational models for automated detection of the following phonation modes: breathy, neutral, flow and pressed (see Sundberg, J. (1987). The science of the singing voice. Illinois University Press).
The collection includes four sets of recordings, each containing about 900 samples of sustained sung vowels. These samples are about 750ms long. Nine different vowels are represented in all phonation modes and on all pitches between A3 and G5. All recordings were produced by one female singer under controlled conditions.
Along with the four phonation modes breathy, neutral, flow and pressed, a different kind of pressed sounds called pressedta in the metadata is included: while pressed vocalization was achieved by raising the larynx, pressedta was an attepmt to raise the subglottal pressure directly, without raising the larynx. |
Link |
Playlist Dataset |
Audio |
75,262 songs/2,840,553 transitions |
The datasets used in this study were collected from Yes.com and Last.fm, which provide radio playlists and tag data, respectively. The data collection occurred from December 2010 to May 2011, resulting in a dataset of 75,262 songs and 2,840,553 transitions. The raw data were pruned to include only songs with appearances above a certain threshold, and then divided into training and testing sets, ensuring each song appeared at least once in the training set. The resulting datasets are named yes_small, yes_big, and yes_complete. For commercial use, please contact Yes.com and Last.fm directly. |
Link |
QBT-Extended |
Symbolic |
3,365 queries / 51 songs |
This dataset gives the onset-related information to experiment query-by-tapping algorithm. Hence, it includes the ground truth of tapping information in CSV files and MIDI file as a ground truth rendering. |
Link |
QMUL-Beatles |
Audio |
181 songs |
This dataset is for analyzing Beatles' music and it provides structure, key, chords, and beats. |
Link |
QMUL-Carole King |
Audio |
14 songs |
This dataset is for analyzing Carole King's music and it provides structure, key, and chords. |
Link |
QMUL-Michael Jackson |
Audio |
38 songs |
This dataset is for analyzing Michael Jackson's music and it provides structure of each song. |
Link |
QMUL-Queen |
Audio |
68 songs |
This dataset is for analyzing Queen's music and it provides structure/key & chords of each song. |
Link |
QMUL-Zweieck |
Audio |
18 songs |
This dataset is for analyzing Zweieck's music and it provides structure, key, chords and beats
of each song. |
Link |
QUSAI |
Audio |
11 songs |
This dataset is for MIR research. QUASI dataset is composed of 11 multitrack songs. The different songs are listed as follows (Artist - Song Name) :
Another Dreamer – One We Love,
Carl Leth – The world is under attack,
Alexq – Carol of the bells,
Emily Hurst – Parting friends,
Fort Minor – Remember the name,
Glen Philips - The spirit of shackleton,
Jims Big Ego – Mix tape,
Nine Inch Nails – Good Soldier,
Shannon Hurley - Sunrise,
Ultimate nz tour, and
Vieux Farka Touré - Ana |
Link |
RECOLA Database |
Audio |
9.5 hours audio, visual, and EEG recordings |
The database encompasses 9.5 hours of recordings, including audio, visual, and physiological data (electrocardiogram and electrodermal activity), captured during online dyadic interactions involving 46 French-speaking participants engaged in collaborative task-solving. Affective and social behaviors, naturally exhibited by the participants, were self-reported at various stages of the study and additionally documented by six French-speaking annotators using the ANNEMO web-based annotation tool. The annotation process, which employed continuous time and value metrics, covered the initial five minutes of each interaction, resulting in 3.8 hours of annotated audiovisual data and 2.9 hours of annotated multimodal data. |
Link |
Repovizz |
Framework |
A framework for remote storage, visual browsing, annotation, and exchange of multi-modal data. |
Multimodal online database and visualization tool. |
Link |
Rock Corpus |
Audio |
200 songs |
The corpus is based on Rolling Stone's "500 Greatest Songs of All Time" list (RS 500), originally published on December 9, 2004. While the list is no longer available online, an archived version can be accessed via web.archive.org. We created a tab-delimited text file of the RS 500 for download, including song rank, title, artist, and year.
Initially, our corpus included 100 songs from the RS 500, selected to ensure chronological balance, referred to as the RS 5x20. One song was excluded, resulting in 99 songs. A tab-delimited file of this subset is available.
Since then, we expanded the corpus to 200 songs by adding the next 101 highest-ranked songs not included in the RS 5x20, now called the RS 200. This list also includes a tab-delimited file. Annotations for the corpus are updated with version numbers, and minor revisions may occur. |
Link |
RWC Music Database |
Audio |
115 songs / 50 classical / 100 songs |
The RWC Music Database, developed by Japan's RWCP and managed by AIST, is a copyright-cleared resource for music research. Available at a nominal cost, it includes six collections of musical pieces and instrument sounds, providing original audio, MIDI files, and lyrics. As the first large-scale music database for research, it offers a benchmark for evaluating music processing systems and supports statistical and learning-based research. Researchers can use the database for publications without copyright restrictions, with the aim of advancing the field of music information processing. |
Link |
Saarland Music Data (SMD) |
MIDI/Audio |
51 songs |
Computers are vital for modern music analysis and generation. Music Information Retrieval (MIR) is a growing field that involves experts from multiple disciplines. In collaboration with MPII and HFM, we aim to foster exploration of computer-based music methods. A key goal is to provide royalty-free music data for research, including standard recordings and MIDI-audio pairs, which are essential for evaluating MIR techniques. |
Link |
Sargon |
Audio |
30 minutes of heavy metal music (4 songs) |
A small dataset of 30 minutes of heavy metal music. It includes audio. |
Link |
SAS: Semantic Artist Similarity Dataset |
Audio |
a corpus of 268 artists and a slightly larger one of 2,336 artists |
The Semantic Artist Similarity dataset consists of two datasets of artists entities with their corresponding biography texts, and the list of top-10 most similar artists within the datasets used as ground truth. The dataset is composed by a corpus of 268 artists and a slightly larger one of 2,336 artists, both gathered from Last.fm in March 2015. The former is mapped to the MIREX Audio and Music Similarity evaluation dataset, so that its similarity judgments can be used as ground truth. For the latter corpus we use the similarity between artists as provided by the Last.fm API. For every artist there is a list with the top-10 most related artists. In the MIREX dataset there are 188 artists with at least 10 similar artists, the other 80 artists have less than 10 similar artists. In the Last.fm API dataset all artists have a list of 10 similar artists. |
Link |
Schenkerian analyses in computer-readable format |
Symbolic (music XML) |
41 pieces |
Each piece of music has a MusicXML file that contains the notes, and an analysis file with the Schenkerian analysis of the excerpt. The analyses mainly list the prolongations present in the music. Each prolongation is in the form X (Y) Z where X and Z are lists of notes that are prolonged by the notes in Y. One of X and Z may be absent. The notes in X, Y, and Z are given so that they may be easily located in the MusicXML file. Each note is specified with a measure number, pitch, octave, and occurrence. For instance, 4f#5-2 specifies the second occurrence of the F# in the fifth octave (using scientific pitch notation) of the fourth measure. |
Link |
Seyerlehner:1517-Artists |
Audio |
3180 full songs |
This dataset is useful for analyzing artists with songs and this dataset was analyzed using music similarity algorithms, which provide the distance matrices. |
Link |
Seyerlehner:Annotated |
Audio |
190 songs |
It contains 190 songs from popular and unknown artists and 19 genres based on votes of a listening experiment. |
Link |
Seyerlehner:Pop |
Audio |
1,105 songs |
The Pop dataset is a tempo classification dataset used in "From Rhythm Patterns to Perceived Tempo". |
Link |
Seyerlehner:Unique |
Audio |
3,315 excerpts |
It contains 3115 song excerpts from popular artists for MIR research. |
Link |
SiSEC (Signal Separation Evaluation Campaign) |
Audio |
5 excerpts |
The findings from the Signal Separation Evaluation Campaign, as detailed in several key papers, highlight both significant accomplishments and ongoing challenges in the field. Vincent et al. (2012) provide a comprehensive overview of the achievements and persistent issues encountered during the 2007-2010 campaign, focusing on advancements in signal separation techniques and areas requiring further research. The 2008 Signal Separation Evaluation Campaign, discussed by Vincent, Araki, and Bofill (2009), emphasized a community-based approach to large-scale evaluation, fostering collaborative progress in the field. For those interested in the practical aspects of signal separation, datasets, evaluation procedures, and results are available for various scenarios, including under-determined and determined speech and music mixtures, head-geometry mixtures of speech sources from multiple directions, and professionally produced music recordings. These resources provide valuable insights and tools for continued exploration and development in signal separation technologies. |
Link |
SPAM Dataset |
Audio |
5 annotations with 50 tracks |
The following annotated datasets are commonly used for assessing structural segmentation in music: Isophonics features 298 annotated tracks primarily from popular music, and includes 769 pieces spanning western popular and world music; SALAMI offers two human references with three levels of annotation per track, covering a broad range of musical styles; The Beatles TUT provides a refined version of 174 annotations for The Beatles' works, meticulously corrected and published by Tampere University of Technology. Additionally, we utilize less conventional and novel datasets such as Cerulean, which consists of 104 songs subjectively selected as challenging across genres from classical to heavy metal; Epiphyte, an industrial set with 1002 tracks mostly in pop music; Sargon, a small collection of 30 minutes of heavy metal tracks under a Creative Commons license; and SPAM, a new dataset detailed in the following section. All datasets have been converted to the JAMS format, the default format used by MSAF, which supports JSON compatibility and accommodates multiple annotations within a single file. While the datasets are publicly available in the MSAF repository, Cerulean and Epiphyte remain privately owned. |
Link |
Suomen Kansan eSävelmät |
Audio |
9,000 songs |
By the end of the 19th century, the Finnish Literary Society (SKS) had amassed a rich trove of Finnish folk tunes, collected in a period prior to the advent of recording technology, resulting in a comprehensive transcription-based archive. Driven by the surge of nationalism that birthed the Finnish national epic, Kalevala, and the Kanteletar, the collection, edited and published by Finnish scholar Ilmari Krohn among others, comprises around 9,000 tunes and is published under the title "Finnish Folk Tunes" (Suomen Kansan Sävelmiä) between 1898 and 1933. This extensive collection is divided into five main subcollections: Spiritual Folk Songs, Folk Songs, Folk Dances, Rune Songs from Ingria and Karelia, and Kantele- ja Jouhikko Tunes, each documenting distinct aspects of Finnish musical heritage. Digitized between 2002 and 2003, the archive, now accessible through the Digital Archive of Finnish Folk Tunes, offers a searchable database that includes detailed musical and geographical information, reflecting the collection's national and historical significance. |
Link |
SymbTr |
Audio |
2,000 songs |
SymbTr Turkish makam music score collection is an archive of more than 2000 pieces presented as part of the CompMusic research project. With 2.x and later versions of Mus2, you can download the archive to your computer and open and work on the works in it. |
Link |
Kostka-Payne corpus |
Symbolic |
46 songs |
The "Kostka-Payne corpus" provides a statistical analysis of harmonic progressions in common-practice music, offering a valuable empirical perspective on the adherence to traditional harmonic principles of the 18th and 19th centuries. This corpus, derived from Stefan Kostka and Dorothy Payne’s "Tonal Harmony" textbook, encompasses 46 excerpts, each at least eight measures long, meticulously analyzed and encoded into a "chord-list" format. This format, featuring chromatic and diatonic relative roots as well as absolute roots, facilitates a rigorous examination of harmonic structures. By converting conventional Roman numeral analyses into a computationally accessible format, this dataset enables detailed exploration of how frequently dominant harmonies resolve to tonics, predominants to dominants, and other traditional harmonic movements are followed. The absence of quality and extension information in this dataset highlights a focused approach to understanding root progressions, providing a foundation for further research into the empirical validity of common-practice harmonic theories. |
Link |
TONAS |
Audio |
72 songs |
The dataset comprises a diverse collection of 72 sung excerpts representing three distinct a cappella singing styles within flamenco, specifically Deblas and two variants of Martinete. The distribution includes 16 excerpts of Deblas, 36 of Martinete 1, and 20 of Martinete 2. Assembled by flamenco expert Dr. Joaquin Mora from the Universidad de Sevilla, this dataset supports a study on the similarity and classification of flamenco singing styles (Tonas). Each excerpt, averaging 30 seconds in length, is monophonic and showcases considerable variability, including different singers, recording conditions, and additional elements like percussion, clapping, background voices, and noise. For further insights into these styles and their musical traits, refer to Mora et al. (2010). Additionally, the dataset includes manual melodic transcriptions created by the COFLA team and Cristina López Gómez. |
Link |
TPD |
Audio |
23,385 tracks |
The Track Popularity Dataset (TPD) addresses a significant gap in music information research by providing a comprehensive resource for analyzing musical track popularity. Unlike existing datasets, the TPD spans from 2004 to 2014 and integrates various definitions of popularity across multiple sources, including Billboard, Last.fm, and Spotify. It features 23,385 tracks, with 9,193 identified as popular and 14,192 from albums containing popular tracks but not deemed popular themselves. The dataset includes 57,800 popularity ratings from Billboard, 43,300 from Last.fm, and 6,500 from Spotify, with a detailed breakdown of overlap among sources. Additionally, 78% of the popular tracks exhibit contextual similarity to other popular tracks within the dataset. The TPD offers three feature-sets for analysis: Feature-set A provides basic, high-level audio features for quick research, Feature-set B includes windowed and detailed spectral features, and Feature-set C focuses on target tempi and energy bands. The dataset is divided into two parts: one for metadata and popularity relations, and another for the feature-sets. |
Link |
TROIS |
Audio with aligned MIDI |
5 excerpts with variations |
The TRIOS dataset is a score-aligned multitrack recordings dataset which can be used for various research problems, such as Score-Informed Source Separation, Automatic Music Transcription, etc. This dataset consists of the separated tracks from five recordings of chamber music trio pieces, with their aligned MIDI scores. |
Link |
Tunebot |
Audio |
10,000 songs |
The Tunebot Dataset is a valuable collection of 10,000 sung contributions gathered from the now-defunct Tunebot Query By Humming system. This dataset comprises recordings of users singing songs, which were originally used to generate a ranked list of music available on iTunes. Each entry in the dataset includes metadata such as the filepath, song title, album, and artist, though it does not include contributor information. Unlike smaller, less representative datasets traditionally used in query by humming research, the Tunebot Dataset offers real-world data from an active music search engine, providing a more accurate benchmark for evaluating the performance of humming-based search algorithms. This makes it an essential resource for researchers aiming to improve and test their systems with a substantial and practical dataset. |
Link |
Weimar Jazz Database (WJazzD) |
Symbolic |
10,000 songs |
The Jazzomat Research Project, hosted at the Hochschule für Musik Franz Liszt Weimar, aims to delve into the creative processes behind jazz solo improvisations through a blend of statistical and computational methods. At the heart of the project is the Weimar Jazz Database, a meticulously curated collection of jazz solo transcriptions, complemented by the open-source Python Library MeloSpyLib for analytical purposes. This interdisciplinary initiative straddles the realms of jazz research, cognitive psychology, and computational musicology, with goals that include describing and differentiating various improvisation styles, comparing these styles to others in music, and probing the cognitive foundations of improvisation. Additionally, the project seeks to evaluate pedagogical strategies for teaching jazz improvisation and to refine statistical methods for analyzing musical complexity and coherence. |
Link |
Moda: Open-Source Democratic Access to Movement Knowledge |
Movement database |
Multiple social, cultural, and historical contexts |
MoDa is an open-source movement database designed to provide public access to shared movement data from diverse social, cultural, and historical contexts. It serves as a valuable resource for scholars and researchers across various disciplines, including the Arts, Humanities, Social Sciences, Health, and Natural and Engineering Sciences. Beyond offering access to movement data, MoDa emphasizes the ontological significance of this data by highlighting its multi-layered social and cultural contexts. The database supports both expert and folksonomy annotations, allowing for a rich diversity of perspectives. Movement data in MoDa is indexed using frameworks such as Laban Movement Analysis, Choreometrics, and social networking frameworks like folksonomies, ensuring a comprehensive approach to understanding movement. Through public and research workshops, MoDa engages with movement experts to explore ontological approaches that facilitate cross-cultural and cross-disciplinary interpretations, allowing for multiple descriptions and truths to coexist and offering a holistic view of movement data. |
Link |