2084: More Music Generating AI
Msanii can generate 190 second long pieces of music, using diffusion models
Audio AI is booming lately. Even though I predicted that it would take a while before it would get good enough to be good, I’m happy to be proven wrong today. I was looking at Msanii, which is a new AI music waveform generator, that uses diffusion models to generate Mel spectrograms, which are essentially a way of encoding audio phase and magnitude in an image, much like Riffusion, which we looked at before. But, the difference is that Msanii also uses a neural network to convert the spectrogram to actual audio, rather than just directly converting it, which allows it to presumably have crisper audio. It was also trained on the Pop909 dataset, which is a dataset of Midi music, which is pretty cool and a dataset I’ve not heard of before. And it sounds super good on the website at least, with the demos being well worth listening to. AI Bands can not be far away!