in Projects, Programming, Technology

Text to Singing: AI to Generate Vocals

This is part of a series on Opportunities for AI in Music Production.

Problem / I can kind of sing, but not that well. I want to sing on my own songs and then apply effects to make it sound “good” (in tune and on time, but also with different formants), or in the style of another singer.

Solution / An audio plugin that transforms an input vocal to a new voice of your own design. The TacoTron 2 algorithm is getting close! especially with rap…

I’m not quite sure how this works. You train a machine learning model on a famous singer’s vocal performance including a transcription, then generate new audio tracks with a new transcript and reference vocal? I’m also not sure where auto-tune fits into this, initially it probably remains a separate prior step in the effects chain.

Up until recently, Vocaloid has been one of the few programs that let you draw in notes and write text to create melodies. Recently, I discovered Emvoice – which sounds and looks really nice. It looks a bit tedious, but I bet once you get the hang of it it is satisfying to specify melodies so precisely. Great work to the Emvoice team for what looks like a great UI and very realistic sound. I can imagine someone combining this with a neural net to recognize your singing as input, output a MIDI melody, then let you chose the singer and edit it.

This Neutrino project is promising, it only generates Japanese singing, but it sounds very good!

The easiest way I found to clone a voice and do text to speech (still talking, not singing) so far is this python notebook. It provides interesting results with speech in a US accent, but didn’t couldn’t match singing samples or UK garage rappers when I uploaded some samples. This in and of itself would be an amazing tool. I’d like to feed it samples of singers and rappers that I like, and generate more phrases in that style.

Next: Mixing Music with AI

Write a Comment