Pipelines using ML to separate audio into stems, transcribe them, index them, and make them searchable through various experimental techniques.

At a high level, "Dissect" includes pipelines to analyze and index audio (mostly music) in experimental ways. The end goal is to programmatically create new, interesting, original, and experimental music from the components of other music.

I use demucs to source separate a lot of files and Magenta's MT3 to transcribe everything to midi. I store metadata for all those files in a LevelDB instance.

So far I've conducted these experiments with around 15k original + separated audio files.

  1. Created a LevelDB index of n-grams created from melodic material and a program that creates random component melodies from an inputed sample midi file and replaces them with segments of corresponding audio from matching n-grams.
  2. Created an Annoy index of MFCCs and created a retrieval algorithm that can find close clumps of similar audio very quickly. Works reasonably well for a simple chord. There is a lot to tweak in the way of MFCC window size / num_features derived.