This is part of a series on Opportunities for AI in Music Production.
Mixing and mastering are two skills that still require a hands-on-human, even though a computer could probably do it better and faster.
Problem / Becoming an audio engineering requires a lot of training – it’s a completely separate discipline from making music. With today’s tools, it is within reach for an artist to mix and master their own songs, but I think AI can do more of the work. What I want is a button in Ableton that will detect which tracks play each role (bass, background, foreground) and then automatically create a balanced mix. It would need to handle overlapping frequencies and transients, panning, perform EQ, and maybe even apply reverb. Actually, I’m kind of shocked there’s no basic “Pink Noise Auto-mix” Max for Live devices – now that I think about it.
Mastering is a slightly simpler process that comes after mixing. Given a single mixed audio file, it must be compressed, EQ’d again, widened, and have the volume maximized.
Solution / Collect a training set of professionally mixed tracks organized by genre. Train an ML model that recognizes the “role” (bass, drum, melody) of each track based on its frequency content and also any metadata like track name, and maps that role to an EQ profile. I’m guessing this would be implemented as an auto-encoder mapping the input parameters to plugin parameters (EQ ranges, compression amounts) – but I’m not sure how that would work with the changing input signal over time.
iZotope’s Neutron product has auto-mixing. It simplifies the process and gets your mix to a point where you can further dial it in.
I think it’s time for a new DAW company to disrupt Ableton Live or Apple Logic by including a data-pipeline (perhaps even cloud-based) to match the audio signal pipeline. The rich data included in a project file provides ample hints to the software on where the artist is going with the music, the software should finish the job.
Several online mastering services (like landr.com) have arisen – so far quality has been rated as worse than the best humans, but good enough for professional publishing. Unlike other parts of the ecosystem, I expect rapid advancement here, because the process is a straightforward input/output of WAV files, and the algorithms can quickly be revised in the cloud.
Next: Using AI in Sound Design.