Envisage a professional musician exploring new compositions without playing a single note on an instrument, or a small business owner effortlessly adding a soundtrack to their latest video ad on Instagram. That’s the potential of AudioCraft — the latest AI tool from Meta that generates high-quality, realistic audio and music from text.
AudioCraft comprises three models: MusicGen, AudioGen and EnCodec. MusicGen, trained with Meta-owned and specifically licensed music, generates music from text prompts, while AudioGen, trained on public sound effects, generates audio from text prompts. Meta is thrilled to release an improved version of its EnCodec decoder, which allows higher quality music generation with fewer artefacts. Also, the pre-trained AudioGen models are being released, which let you generate environmental sounds and sound effects like a dog barking, cars honking, or footsteps on a wooden floor. Furthermore, all the AudioCraft model weights and code are being shared.
Meta is open-sourcing these models, offering researchers and practitioners access so they can train their own models with their own datasets for the first time, helping advance the field of AI-generated audio and music.
While there’s been significant excitement around generative AI for images, video, and text, audio has seemed to lag slightly behind. Generating high-fidelity audio of any kind requires modelling complex signals and patterns at varying scales. Music is arguably the most challenging type of audio to generate as it’s composed of local and long-range patterns, from a suite of notes to a global musical structure with multiple instruments.
Meta says the AudioCraft family of models is capable of producing high-quality audio with long-term consistency, and they’re easy to use. With AudioCraft, Meta says it simplifies the overall design of generative models for audio compared to previous work in the field — giving people the full recipe to play with the existing models that Meta has developed over the past several years, while also empowering them to push the limits and develop their own models.
AudioCraft works for music, sound, compression, and generation — all in the same place. It’s designed to be easy to build on and reuse, individuals wishing to build better sound generators, compression algorithms, or music generators can do it all in the same code base and build on top of what others have done.
According to Meta, “Having a solid open-source foundation will foster innovation and complement the way we produce and listen to audio and music in the future. With even more controls, Meta believes that MusicGen can turn into a new type of instrument — just like synthesizers when they first appeared.
“Meta views the AudioCraft family of models as tools for musicians and sound designers to provide inspiration, help people quickly brainstorm and iterate on their compositions in innovative ways. The company eagerly awaits what people will create with Audiocraft.”