Machine learning has been applied to making music in many fun and experimental ways, but there are still key steps in the music-making process that could benefit greatly from modern tools. We’re just a few years off from computers making entire songs – great sounding music that is indistinguishable from human-made music. Computers can already compose text and images that are indistinguishable from human work.
However, at the moment I’m really interested in near-term developments that will enhance the human experience of composing music (and in particular electronic genres of music like house and techno). Much of music production is now done “in the box” (on the computer) and the time is right for AI innovation.
Sound Design
Surprisingly, while there are hundreds of esoteric software synthesizers taking a myriad of approaches to audio synthesis, there hasn’t yet been a mainstream neural network approach.
Problem / While synthesizers are capable of generating endlessly varying sounds – from analog waveforms to realistic samples of real-world instruments, they still require specialized knowledge to program.
A sample-based synthesizer that reproduces a realistic instrument requires hundreds of recordings of that instrument under careful conditions and giant sets of sample files. There’s no mathematical way to extrapolate between sounds.
Furthermore, given a song or genre, it’s hard to find a sound to match. Lastly, novel synthetic sounds have inspired huge leaps in music genres over the past century, and there’s more space to be explored.
Solution / Just like Machine Learning approaches are getting good at generating images, a Neural Network based generative synthesizer will be excellent at generating sounds. Using a training set of real-world and synthetic samples, the ideal neural synthesizer would allow me to select a “tech house bass” preset or generate a mix between a “Moog mini square bass” and an “electric bass guitar.” I’m not sure if it could generate sounds in near-real-time, or if you would make a choice and wait for a sample set to be created.
NSynth Neural Synthesis from Google is an impressive proof-of-concept.
“Unlike a traditional synthesizer which generates audio from hand-designed components like oscillators and wavetables, NSynth uses deep neural networks to generate sounds at the level of individual samples. Learning directly from data, NSynth provides artists with intuitive control over timbre and dynamics and the ability to explore new sounds that would be difficult or impossible to produce with a hand-tuned synthesizer.” I can’t wait to have NSynth as an Ableton plugin!
Another opportunity here is parameter reduction – using math to figure out which regions of the multi-dimensional space defined by all those synthesizer knobs sound interesting. In other words, reduce the 50 knobs on a synthesizer like Serum to 3-4 knobs that let the artist explore the space of the possible sounds Serum can generate.
Here is an engineer’s proof of concept for a synth parameter reducer.
Another tool using math to aid creativity is Factorsynth, a “device that uses a data analysis algorithm called matrix factorization to decompose any audio clip into a set of temporal and spectral elements. By rearranging and modifying these components you can do powerful transformations to your clips, such as removing notes or motifs, creating new ones, randomizing melodies or timbres, changing rhythmic patterns, remixing loops in real time, applying effects selectively only to certain elements of the sound, creating complex sound textures.”
Cool. I agree – can’t wait to see the developments in generative AI for music.