Name: Automatic generation of music captions
Start: 2025-05-23T17:20:00+0200
End: 2025-05-23T17:40:00+0200

Friday May 23, 2025 5:20pm - 5:40pm CEST

This paper discusses the process of generating natural language music descriptions, called captioning, using deep learning and large language models. A novel encoder architecture is trained to learn large-scale music representations and generate high-quality embeddings, which a pre-trained decoder then uses to generate captions. The captions used for training are from the state-of-the-art LP-MusicCaps dataset. A qualitative and subjective assessment of the quality of created captions is performed, showing the difference between various decoder models.

Speakers

Mateusz Zieleziński

Ewa Łukasik

Friday May 23, 2025 5:20pm - 5:40pm CEST
C1 ATM Studio Warsaw, Poland

AI & Machine Audition

Presentation Type Paper Presentation

AES Europe 2025

Mateusz Zieleziński

Ewa Łukasik

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!