AES Europe 2025: Full Schedule

9:30am CEST

Sound Synthesis 101: An Introduction To Sound Creation

Thursday May 22, 2025 9:30am - 11:00am CEST

Sound synthesis is a key part of modern music and audio production. Whether you are a producer, composer, or just curious about how electronic sounds are made, this workshop will break it down in a simple and practical way.

We will explore essential synthesis techniques like subtractive, additive, FM, wavetable, and granular synthesis. You will learn how different synthesis methods create and shape sound, and see them in action through live demonstrations using both hardware and virtual synthesizers, including emulators of the legendary studio equipment.

This session is designed for everyone — whether you are a total beginner or an experienced audio professional looking for fresh ideas. You will leave with a solid understanding of synthesis fundamentals and the confidence to start creating your own unique sounds. Join us for an interactive, hands-on introduction to the world of sound synthesis!

Speakers

Krzysztof Kicior

Thursday May 22, 2025 9:30am - 11:00am CEST
C1 ATM Studio Warsaw, Poland

Analysis and synthesis of sound Electronic dance music Sound design and reinforcement Studio recording techniques

Presentation Type Workshop

11:15am CEST

Don't run! It's just a synthesizer

Thursday May 22, 2025 11:15am - 12:45pm CEST

Everybody knows the existence of music with electronic elements. Most of us are aware of the synthesis standing behind it. But the moment I start asking about what's under the hood, the majority of the audience start to run for their lifes. Which is rather sad for me, because learning synthesis could be among the greatest journeys you could take in your life. And I want to back those words up on my workshop.

Let's talk and see what exactly is synthesis, and what it is not. Let's talk about building blocks of basic substractive setup. We will track all the knobs, buttons and sliders, down to every single cable under the front panel. Simply to see which "valve" and "motor" is controlled by which knob. And how does it sounds.

I also want to make you feel safe about modular setups, because when you understand the basic blocks - you understand the modular synthesis. Just like building from bricks!

Speakers

Gustaw Miłoszewski

Thursday May 22, 2025 11:15am - 12:45pm CEST
C1 ATM Studio Warsaw, Poland

Analysis and synthesis of sound Musical instrument design Sound design and reinforcement

Presentation Type Workshop

3:00pm CEST

Analysis and Model of Temporal Sound Attributes from Recorded Audio

Thursday May 22, 2025 3:00pm - 3:20pm CEST

A computational framework is proposed for analyzing the temporal evolution of perceptual attributes of sound stimuli. As a paradigm, the perceptual attribute of envelopment, which is manifested in different audio sound reproduction formats, is employed. For this, listener temporal ratings of the envelopment for mono, stereo, and 5.0-channel surround music samples, serve as the ground truth for establishing a computational model that can accurately trace temporal changes from such recordings. Combining established and heuristic methodologies, different features of the audio signals were extracted at each segment that envelopment ratings were registered, named long-term (LT) features. A memory LT computational stage is proposed to account for the temporal variations of the features through the duration of the signal, based on the exponentially weighted moving average of the respective LT features. These are utilized in a gradient tree boosting, machine learning algorithm, leading to a Dynamic Model that accurately predicts the listener’s temporal envelopment ratings. Without the proposed memory LT feature function, a Static Model is also derived, which is shown to have lower performance for predicting such temporal envelopment variations.

Speakers

Georgios Moiragias

Department of Electrical and Computer Engineering, University of Patras

I am a graduate of the Electrical and Computer Engineering Department of the University of Patras. Since 2020, I am a PhD candidate in the same department under the supervision of Professor John Mourjopoulos. My research interests include analysis and modeling of perceptual and affective... Read More →

John Mourjopoulos

Professor emeritus, University of Patras

John Mourjopoulos is Professor Emeritus at the Department of Electrical and Computer Engineering, University of Patras and a Fellow of the AES. As the head of the Audiogroup for nearly 30 years, he has authored and presented more than 200 journal and conference papers. His research... Read More →

Thursday May 22, 2025 3:00pm - 3:20pm CEST
C1 ATM Studio Warsaw, Poland

AI & Machine Audition

Presentation Type Paper Presentation

3:20pm CEST

Honeybee sound generation using Machine learning techniques

Thursday May 22, 2025 3:20pm - 3:40pm CEST

The Honeybee is an insect known to almost all human beings around the world. The sounds produced by bees is a ubiquitous staple of the soundscape of the countryside and forest meadows, bringing an air of natural beauty to the perceived environment. Honeybee-produced sounds are also an important part of apitherapeutic experiences, where the close-quarters exposure to honeybees proves beneficial to the mental and physical well-being of humans. This research investigates the generation of synthetic honeybee buzzing sounds using Conditional Generative Adversarial Networks (cGANs). Trained on a comprehensive dataset of real recordings collected both inside and outside the beehive during a long-term audio monitoring session. The models produce diverse and realistic audio samples. Two architectures were developed: an unconditional GAN for generating long, high-fidelity audio, and a conditional GAN that incorporates time-of-day information to generate shorter samples reflecting diurnal honeybee activity patterns. The generated audio exhibits both spectral and temporal properties similar to real recordings, as confirmed by statistical analysis performed during the experiment. This research has implications for scientific research in honeybee colony health monitoring as well as apitherapy research. and artistic endeavours, for example in sound design and immersive soundscape creation, the trained generator model is publicly available on the project’s website.

Speakers

Piotr Książek

Urszula Libal

Thursday May 22, 2025 3:20pm - 3:40pm CEST
C1 ATM Studio Warsaw, Poland

AI & Machine Audition

Presentation Type Paper Presentation

3:40pm CEST

Moving Sound Source Localization and Tracking based on Envelope Estimation for Unknown Number of Sources

Thursday May 22, 2025 3:40pm - 4:00pm CEST

Existing methods for moving sound source localization and tracking face significant challenges when dealing with an unknown number of sound sources, which substantially limits their practical applications. This paper proposes a moving sound source tracking method based on source signal envelopes that does not require prior knowledge of the number of sources. First, an encoder-decoder attractor (EDA) method is used to estimate the number of sources and obtain an attractor for each source, based on which the signal envelope of each source is estimated. This signal envelope is then used as a clue for tracking the target source. The proposed method has been validated through simulation experiments. Experimental results demonstrate that the proposed method can accurately estimate the number of sources and precisely track each source.

Speakers

Donghang Wu

Jiaqi Du

Tianshu Qu

Peking University

Qingbo Huang

Dejun Zhang

Thursday May 22, 2025 3:40pm - 4:00pm CEST
C1 ATM Studio Warsaw, Poland

AI & Machine Audition

Presentation Type Paper Presentation

4:00pm CEST

Room Geometry Inference Using Localization of the Sound Source and Its Early Reflections

Thursday May 22, 2025 4:00pm - 4:20pm CEST

Traditional methods for inferring room geometry from sound signals are predominantly based on Room Impulse Response (RIR) or prior knowledge of the sound source location. This significantly restricts the applicability of these approaches. This paper presents a method for estimating room geometry based on the localization of direct sound source and its early reflections from First-Order Ambisonics (FOA) signals without the prior knowledge of the environment. First, this method simultaneously estimates the Direction of Arrival (DOA) of the direct source and the detected first-order reflected sources. Then, a Cross-attention-based network for implicitly extracting the features related to Time Difference of Arrival (TDOA) between the direct source source and the first-order reflected sources is proposed to estimate the distances of the direct and the first-order reflected sources. Finally, the room geometry is inferred from the localization results of the direct and the first-order reflected sources. The effectiveness of the proposed method was validated through simulation experiments. The experimental results demonstrate that the method proposed achieves accurate localization results and performs well in inference of room geometry.

Speakers

Donghang Wu

Xihong Wu

Tianshu Qu

Peking University

Thursday May 22, 2025 4:00pm - 4:20pm CEST
C1 ATM Studio Warsaw, Poland

AI & Machine Audition

Presentation Type Paper Presentation

4:40pm CEST

Comparing Human and Machine Ensemble Width Estimation in Binaural Music Recordings under Simulated Anechoic Conditions

Thursday May 22, 2025 4:40pm - 5:00pm CEST

In recent years, there has been an increasing interest in binaural technology due to its ability to create immersive spatial audio experiences, particularly in streaming services and virtual reality applications. While audio localization studies typically focus on individual sound sources, ensemble width (EW) is crucial for scene-based analysis, as wider ensembles enhance immersion. We define intended EW as the angular span between the outermost sound sources in an ensemble, controlled during binaural synthesis. This study presents a comparison between human perception of EW and its automatic estimation under simulated anechoic conditions. Fifty-nine participants, including untrained listeners and experts, took part in listening tests, assessing 20 binaural anechoic excerpts synthesized using 2 publicly available music recordings, 2 different HRTFs, and 5 distinct EWs (0° to 90°). The excerpts were played twice in random order via headphones through a web-based survey. Only a subset of ten listeners, of which nine were experts, passed post-screening tests, with a mean absolute error (MAE) of 74.62° (±38.12°), compared to MAE of 5.92° (±0.14°) achieved a by pre-trained machine learning method using auditory modeling and gradient-boosted decision trees. This shows that while intended EW can be algorithmically extracted from synthesized recordings, it significantly differs from human perception. Participants reported insufficient externalization, front-back confusion (suggesting HRTF mismatch). The untrained listeners demonstrated response inconsistencies and a low degree of discriminability, which led to the rejection of most untrained listeners during post-screening. The findings may contribute to the development of perceptually aligned EW estimation models.

Speakers

Paweł Antoniuk

Karol Wójcik

Slawomir Zielinski

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield

Professor

Thursday May 22, 2025 4:40pm - 5:00pm CEST
C1 ATM Studio Warsaw, Poland

AI & Machine Audition

Presentation Type Paper Presentation

5:00pm CEST

Data-driven estimation of traditional frame drum construction specifications

Thursday May 22, 2025 5:00pm - 5:20pm CEST

This research aims to provide a systematic approach for the analysis of geometrical and material characteristics of traditional frame drums using deep learning. A data-driven approach is used, integrating supervised and unsupervised feature extraction techniques to associate measurable audio features with perceptual attributes. The methodology involves the training of convolutional neural networks on Mel-Scale spectrograms to estimate wood type (classification), diameter (regression), and depth (regression). A multi-labeled dataset containing recorded samples of frame drums of different specifications is used for model training and evaluation. Hierarchical classification is explored, incorporating playing techniques and environmental factors. Handcrafted features enhance interpretability, helping determine the impact of construction attributes on sound perception, ultimately aiding instrument design. Data augmentation techniques, including pitch alterations, additive noise, etc. are introduced to expand the generalization of the approach and dataset expansion.

Speakers

Nikolaos Vryzas

Aristotle University Thessaloniki

Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →

Vasileios Bountourakis

Antonis Pagonis

Thursday May 22, 2025 5:00pm - 5:20pm CEST
C1 ATM Studio Warsaw, Poland

AI & Machine Audition

Presentation Type Paper Presentation

5:20pm CEST

Automatic generation of music captions

Thursday May 22, 2025 5:20pm - 5:40pm CEST

This paper discusses the process of generating natural language music descriptions, called captioning, using deep learning and large language models. A novel encoder architecture is trained to learn large-scale music representations and generate high-quality embeddings, which a pre-trained decoder then uses to generate captions. The captions used for training are from the state-of-the-art LP-MusicCaps dataset. A qualitative and subjective assessment of the quality of created captions is performed, showing the difference between various decoder models.

Speakers

Mateusz Zieleziński

Ewa Łukasik

Thursday May 22, 2025 5:20pm - 5:40pm CEST
C1 ATM Studio Warsaw, Poland

AI & Machine Audition

Presentation Type Paper Presentation

9:15am CEST

Generative AI in Audio Education: Process-Centred Teaching for a Product-Centred World

Friday May 23, 2025 9:15am - 9:35am CEST

Artificial intelligence (AI) tools are transforming the way music is being produced. The rate of development is expeditious, and the associated metamorphosis of audio education is abrupt. Higher-level education is largely built around the objectives of knowledge transmission and skills development, evidenced by the emphasis on learning in the cognitive domain in University programmes. But the cohort of skills that music producers will require in five years’ time is unclear, making skills-based curriculum planning challenging. Audio educators require a systematic approach to integrate AI tools in ways that enhance teaching and learning.

This study uses speculative design as the underpinning research methodology. Speculative design employs design to explore and evaluate possible futures, alternative realities, and sociotechnical trends. In this study, the practical tasks in an existing university module are modified by integrating available GAI tools to replace or augment the task design. This tangible artefact is used to critique prevailing assumptions concerning the use of GAI in music production and audio education. The findings suggest that GAI tools will disrupt the existing audio education paradigm. Employing a process-centred approach to teaching and learning may represent a key progression for educators to help navigate these changes.

Speakers

Malachy Ronan

Friday May 23, 2025 9:15am - 9:35am CEST
C1 ATM Studio Warsaw, Poland

Audio in Education

Presentation Type Paper Presentation

9:35am CEST

A Collaborative and Reflective Framework for Redesigning Music Technology Degree Programmes

Friday May 23, 2025 9:35am - 9:55am CEST

Cyclical formal reviews are essential to keep Music and Audio Technology degree programmes current. Whilst clear institutional guidance exists on the requisite documentation to be submitted, there is little guidance concerning the process used to gather the information. To address this issue, a 12 step collaborative and reflective framework was developed to review a degree programme in Music Technology.

This framework employs Walker’s ‘Naturalistic’ process model and design thinking principles to create a dynamic, stakeholder-driven review process. The framework begins with reflective analysis by faculty, helping to define program identity, teaching philosophy, and graduate attributes. Existing curricula are evaluated using Boehm et al.’s (2018) tetrad framework of Music Technology encompassing the sub-disciplines of production, technology, art, and science. Insights from industry professionals, learners, and graduates are gathered through semi-structured interviews, surveys, and focus groups to address skill gaps, learner preferences, and emerging trends. A SWOT analysis further refines the scope and limitations of the redesign process, which culminates in iterative stakeholder consultations to finalise the program’s structure, content, and delivery.

This process-centred approach emphasises adaptability, inclusivity, and relevance, thus ensuring the redesigned program is learner-centred and aligned with future professional and educational demands. By combining reflective practice and collaborative engagement, the framework offers a comprehensive, replicable model for educators redesigning degree programmes in the discipline. This case study contributes to the broader discourse on curriculum design in music and audio degree programmes, demonstrating how interdisciplinary and stakeholder-driven approaches can balance administrative requirements with pedagogical innovation.

Speakers

Malachy Ronan

Kevin Garland

PhD Researcher, TUS

Kevin Garland is a Postgraduate PhD Researcher at the Technological University of the Shannon: Midlands Midwest (TUS), Ireland. His primary research interests include human-computer interaction, user-centered design, and audio technology. Current research lies in user modelling and... Read More →

Friday May 23, 2025 9:35am - 9:55am CEST
C1 ATM Studio Warsaw, Poland

Audio in Education

Presentation Type Paper Presentation

9:55am CEST

Acoustic Sovereignties: Resounding Indigenous Knowledge in Sound-Based Research

Friday May 23, 2025 9:55am - 10:15am CEST

Acoustic Sovereignties (2024) is a First Nations, anti-colonial spatial audio exhibition held in Naarm (Melbourne), Australia. Through curatorial and compositional practices, Acoustic Sovereignties confronts traditional soundscape and Western experimental sound disciplines by foregrounding marginalised voices.
As this research will show, the foundations of sound-based practices such as Deep Listening and Soundscape Studies consisted of romanticised notions of Indigenous spirituality, in addition to the intentional disregard for First Nations stewardship and kinship with the land and its acoustic composition. Acoustic Sovereignties aims at reclaiming Indigenous representation throughout sound-based disciplines and arts practices by providing a platform for voices, soundscapes and knowledge to be heard.

Speakers

Hayden Ryan

Graduate Student, RMIT University

My name is Hayden Ryan, I am a First Nations Australian sound scholar and artist, and a 2024 New York University Music Technology Masters graduate. I am currently a Vice Chancellor's Indigenous Pre-Doctoral Fellow at RMIT University, where my PhD focuses on the integration of immersive... Read More →

Friday May 23, 2025 9:55am - 10:15am CEST
C1 ATM Studio Warsaw, Poland

Audio in Education

Presentation Type Paper Presentation

10:40am CEST

Testing Auditory Illusions in Augmented Reality: Plausibility, Transfer-Plausibility and Authenticity

Friday May 23, 2025 10:40am - 11:00am CEST

Experiments testing sound for augmented reality can involve real and virtual sound sources. Paradigms are either based on rating various acoustic attributes or testing whether a virtual sound source is believed to be real (i.e., evokes an auditory illusion). This study compares four experimental designs indicating such illusions. The first is an ABX task suitable for evaluation under the authenticity paradigm. The second is a Yes/No task, as proposed to evaluate plausibility. The third is a three-alternative-forced-choice (3AFC) task using different source signals for real and virtual, proposed to evaluate transfer-plausibility. Finally, a 2AFC task was tested. The renderings compared in the tests encompassed mismatches between real and virtual room acoustics. Results confirm that authenticity is hard to achieve under nonideal conditions, and ceiling effects occur because differences are always detected. Thus, the other paradigms are better suited for evaluating practical augmented reality audio systems. Detection analysis further shows that the 3AFC transfer-plausibility test is more sensitive than the 2AFC task. Moreover, participants are more sensitive to differences between real and virtual sources in the Yes/No task than theory predicts. This contribution aims to aid in selecting experimental paradigms in future experiments regarding perceptual and technical requirements for sound in augmented reality.

Speakers

Nils Meyer-Kahlen

Aalto University

Sebastia Vicenc Amengual Gari

Sebastia V. Amengual Gari is currently a research scientist at Reality Labs Research (Meta) working on room acoustics, spatial audio, and auditory perception. He received a Diploma Degree in Telecommunications with a major in Sound and Image in 2014 from the Polytechnic University... Read More →

Sebastian Schlecht

Professor of Practice, Aalto University

Sebastian J. Schlecht is Professor of Practice for Sound in Virtual Reality at the Aalto University, Finland. This position is shared between the Aalto Media Lab and the Aalto Acoustics Lab. His research interests include spatial audio processing with an emphasis on artificial reverberation, synthesis, reproduction, and 6-degrees-of-freedom virtual and mixed reality applications. In particular, his research efforts have been directed towards the intersection of app... Read More →

Tapio Lokki

Department of Signal Processing and Acoustics, Aalto University

Friday May 23, 2025 10:40am - 11:00am CEST
C1 ATM Studio Warsaw, Poland

Perception & Listening Tests

Presentation Type Paper Presentation

11:00am CEST

Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF

Friday May 23, 2025 11:00am - 11:20am CEST

Immersive audio mix presentations involve transmitting and rendering several audio elements simultaneously. This enables next-generation applications, such as personalized playback. Using immersive loudspeaker and headphone MUSHRA tests, we investigate rate vs. quality for a typical mix presentation use case of a foreground stereo element, plus a background Ambisonics scene. For coding, we use Immersive Audio Model and Formats, a recently proposed system for Next-Generation Audio. Excellent quality is achieved at 384 kbit/s, even with reasonable amount of personalization. We also propose a framework for content-aware analysis that can significantly reduce the bitrate even when using underlying legacy audio coding instances.

Speakers

Carlos Tejeda Ocampo

Samsung Research Tijuana

Toni Hirvonen

Ema Souza Blanes

Mahmoud Namazi

Jan Skoglund

Google

Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →

Friday May 23, 2025 11:00am - 11:20am CEST
C1 ATM Studio Warsaw, Poland

Perception & Listening Tests

Presentation Type Paper Presentation

11:20am CEST

Evaluation of auditory distance perception in reflective sound field by static and dynamic virtual auditory display

Friday May 23, 2025 11:20am - 11:40am CEST

A psychoacoustic experiment is conducted to evaluate and compared the auditory distance perception in reflected sound field by using static and dynamic VAD. The binaural signals creased by a point source at different distances in a rectangular room are simulated. The contribution of direct sound to binaural signals is simulated by near-field head-related transfer function filters and a gain factor to account for the propagation attenuation of spherical surface wave. The contribution of early reflections up to the second order and later reverberation are respectively simulated by the image source method and Schroeder reverberation algorithm. The results of psychoacoustic experiment indicates that there are still significant differences between the perceived distances created by static VAD and these created by dynamic VAD in the simulated reflected condition, although the differences are not so large as those in the simulated free-field case. The results of dynamic VAD are more consistent with these of real sound source. Therefore, simulating reflections reduces the in-head-localization and thus improves the control of perceived distance in headphone presentation, but static VAD is still less effective in creating different distance perception. Dynamic VAD is still needed in the distance perception experiment for hearing researches even if simulated reflections are included. In practical applications, dynamic VAD is advocated for recreating virtual source at different distance.

Speakers

Zixiang Yan

Jun Zhu

Bousn Xie

Friday May 23, 2025 11:20am - 11:40am CEST
C1 ATM Studio Warsaw, Poland

Perception & Listening Tests

Presentation Type Paper Presentation

11:40am CEST

Subjective evaluation of immersive microphone arrays for drums

Friday May 23, 2025 11:40am - 12:00pm CEST

Through a practice-oriented study, various coincident, near-coincident, and non-coincident immersive microphone arrays were compared during drum recordings for different contemporary popular music genres. In a preliminary study, the OCT-3D, PCMA-3D, 2L-Cube, Hamasaki Square, IRT Cross, Ambisonics A-Format, and native B-Format were informally compared, revealing that the differences between non-coincident systems were much smaller than the differences between coincident and non-coincident systems. This led to a reduction in microphone systems for the final drum recordings. Four microphone techniques were selected: OCT-3D, native B-Format, Ambisonics A-Format, and IRT Cross. These were compared within the context of two different songs – a calm pop track and an energetic rock song – where the drums were respectively recorded in a dry drum booth and a large studio hall. Through a listening test with a small sample group, it was determined which microphone technique was best suited for each song. Participants were also asked to identify the general favorite, without musical context, as well as how the spatiality, timbre, and height were perceived. It was concluded that the choice of immersive microphone technique depends on the musical context. Conclusions from more objective studies focus primarily on accurate localization, with non-coincident systems consistently performing the best. However, these studies do not take into account the musical context, where accurate localization does not always take precedence. Furthermore, it was noted that height perception in music is not solely created by speakers in the height range. The comparative drum recordings are published through https://www.immersive.pxl.be/immersive- microphone-techniques-for-drums/.

Speakers

Arthur Moelants

Researcher, PXL Music Research

Steven Maes

Founder of Motormusic Studios, Researcher & Lecturer at PXL Music, PXL Music

Friday May 23, 2025 11:40am - 12:00pm CEST
C1 ATM Studio Warsaw, Poland

Perception & Listening Tests

Presentation Type Paper Presentation

PhD Candidate, McGill University

Kathleen Zhang

McGill University

Jack Kelly

Richard King

Professor, McGill University

Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →

Wieslaw Woszczyk

Friday May 23, 2025 2:50pm - 3:10pm CEST
C1 ATM Studio Warsaw, Poland

Perception & Listening Tests

Presentation Type Paper Presentation

3:10pm CEST

Detection of spectral component asynchrony: Applying psychoacoustic research to transient phenomena in music

Friday May 23, 2025 3:10pm - 3:30pm CEST

Numerous studies highlight the role of transient behavior in musical sounds and its impact on sound identification. This study compares these findings with established psychoacoustic measurements of detection thresholds for asynchrony in onset and offset transients, obtained using synthesized stimuli that allowed precise control of stimulus parameters. Results indicated that onset asynchrony can be detected at thresholds as low as 1 ms—even half a cycle of the component frequency. In contrast, offset asynchrony detection was found to be less precise, with thresholds ranging from 5 to 10 ms. Sensitivity improves when multiple harmonics are asynchronous. Additionally, component phase significantly influences onset asynchrony detection: at 1000 Hz and above, phase shifts raise thresholds from below 1 ms to around 50 ms, while having little effect on offset detection. Although these findings were based on controlled artificial stimuli, they can provide valuable insight into asynchrony in natural musical sounds. In many cases, detection thresholds are well below the variations observed in music, yet under certain conditions and frequencies, some temporal variations may become not perceptible.

Speakers

Jan Żera

Friday May 23, 2025 3:10pm - 3:30pm CEST
C1 ATM Studio Warsaw, Poland

Perception & Listening Tests

Presentation Type Paper Presentation

4:15pm CEST

Key Technology Briefings 3

Friday May 23, 2025 4:15pm - 6:00pm CEST

Friday May 23, 2025 4:15pm - 6:00pm CEST
C1 ATM Studio Warsaw, Poland

9:00am CEST

Strategies for Obtaining True Quasi-Anechoic Loudspeaker Response Measurements

Saturday May 24, 2025 9:00am - 9:20am CEST

Simple truncation of the reflections in the impulse response of loudspeakers measured in normal rooms will increasingly falsify the response below about 500 Hz for typical situations. Well-known experience and guidance from loudspeaker models allow the determination of the lowest frequency for which truncation suffices. This paper proposes two additional strategies for achieving much improved low-frequency responses that are complementary to the easily-obtained high-frequency response: (a) a previously published nearfield measurement which can be diffractively transformed to a farfield response with appropriate calculations, here presented with greatly simplified computations, and (b) a measurement setup that admits only a single floor reflection which can be iteratively corrected at low frequencies. Theory and examples of each method are presented.

Speakers

John Vanderkooy

Saturday May 24, 2025 9:00am - 9:20am CEST
C1 ATM Studio Warsaw, Poland

Acoustic Transducers & Measurements

Presentation Type Paper Presentation

9:20am CEST

IMPro -- Method for Integrated Microphone Pressure Frequency Response Measurement Using a Probe Microphone

Saturday May 24, 2025 9:20am - 9:40am CEST

We propose a practical method for the measurement of the pressure sensitivity frequency response of a microphone that has been integrated into product mechanics. The method uses a probe microphone to do determine the sound pressure entering the inlet of the integrated microphone. We show that the measurements can be performed in a normal office environment as well as in anechoic conditions. The method is validated with measurement of a rigid spherical microphone prototype having analytically defined scattering characteristics. Our results indicate that the proposed method, called IMPro, can effectively measure the pressure sensitivity frequency response of microphones in commercial products, quite independent of the measurement environment.

Speakers

John Cozens

Matti Hämäläinen

Mikko Pekkarinen

Saturday May 24, 2025 9:20am - 9:40am CEST
C1 ATM Studio Warsaw, Poland

Acoustic Transducers & Measurements

Presentation Type Paper Presentation

9:40am CEST

Non-invasive sound field sensing in enclosures using acousto-optics

Saturday May 24, 2025 9:40am - 10:00am CEST

It is challenging to characterize sound across space, especially in small enclosed volumes, using conventional microphone arrays.
This study explores acousto-optic sensing methods to record the sound field throughout an enclosure, including regions close to a source and boundaries.
The method uses a laser vibrometer to sense modulations of the refractive index in air, caused by the propagating sound pressure waves.
Compared to microphone arrays, the sound field can be measured non-invasively and at high resolution which is particularly attractive at high frequencies, in enclosures of limited size or unfavorable mounting conditions for fixtures.
We compensate for vibrations that contaminate and conceal the acousto-optic measurements and employ an image source model to also reconstruct early parts of the impulse response.
The results demonstrate that acousto-optic measurements can enable the analysis of sound field in enclosed spaces non-invasively and with high resolution.

Speakers

Manuel Hahmann

Chloé Balmes

Esteban Fuentes

Samuel Verburg

Saturday May 24, 2025 9:40am - 10:00am CEST
C1 ATM Studio Warsaw, Poland

Acoustic Transducers & Measurements

Presentation Type Paper Presentation

10:00am CEST

The Search for a Universal Microphone

Saturday May 24, 2025 10:00am - 10:20am CEST

Recording engineers and producers choose different microphones for different sound sources. It is intriguing that, in the 1950s and 1960s, the variety of available microphones was relatively limited compared to what we have available today. Yet, recordings from that era remain exemplary even now. The microphones used at the time were primarily vacuum tube models.
Through discussions at AES Conventions on improving phantom power supplies and my own experimentation with tube microphones myself, I began to realize that defining attribute of their sound might not stem solely from the tubes themselves. Instead, the type of power supply appeared to play a crucial role in shaping the final sound quality.
This hypothesis was confirmed with the introduction of high-voltage DPA 4003 and 4004 microphones, compared to their phantom-powered counterparts, the 4006 and 4007. In direct comparisons, the microphones with external, more current-efficient power supplies consistently delivered superior sound.
Having worked extensively with numerous AKG C12 and C24 microphones I identified two pairs, one of C12s and one of C24s with identical frequency characteristics. For one C12, we designed an entirely new, pure Class A transistor-based circuit with an external power supply.
Reflecting on my 50-plus years as a sound engineer and producer, I sought to determine which microphones were not only the best, but also the most versatile. My analysis led to four key solutions extending beyond the microphones themselves. Since I had already developed an ideal Class A equalizer, I applied the same technology to create four analog equalizers designed to fine-tune the prototype microphone’s frequency characteristics at the power supply level.

Speakers

Andrew Lipinski

Saturday May 24, 2025 10:00am - 10:20am CEST
C1 ATM Studio Warsaw, Poland

Acoustic Transducers & Measurements

Presentation Type Paper Presentation

10:40am CEST

Immersive recordings in virtual acoustics: differences and similarities between a concert hall and its virtual counterpart

Saturday May 24, 2025 10:40am - 11:00am CEST

Virtual acoustic systems can artificially alter a recording studio's reverberation in real time using spatial room impulse responses captured in different spaces. By recreating another space's acoustic perception, these systems influence various aspects of a musician's performance. Traditional methods involve recording a dry performance and adding reverb in post-production, which may not align with the musician's artistic intent. In contrast, virtual acoustic systems allow simultaneous recording of both artificial reverb and the musician's interaction using standard recording techniques—just as it would occur in the actual space. This study analyzes immersive recordings of nearly identical musical performances captured in both real concert hall and McGill University's Immersive Media Lab (Imlab), which features a new dedicated virtual acoustics software, and highlights the similarities and differences between the performances recorded in the real space and its virtual counterpart.

Speakers

Gianluca Grazioli

Montreal, Canada, McGill University

Andrea Gozzi

Mehdi Rahimdokht

Alessandro Braga

Richard King

Professor, McGill University

Wieslaw Woszczyk

Saturday May 24, 2025 10:40am - 11:00am CEST
C1 ATM Studio Warsaw, Poland

Acoustics

Presentation Type Paper Presentation

11:00am CEST

Analysis of the acoustic impulse response of an auditorium

Saturday May 24, 2025 11:00am - 11:20am CEST

The acoustic behaviour of an auditorium is analysed after measurements performed according to the ISO 3382:1 standard. The all-pole analysis of the measured impulse responses confirms the hypothesis that all responses have a common component that can be attributed to room characteristis. Results from a subsequent non-parametric analysis allows conjecturing that the overall reponse of the acoustic channel between two points may de decomposed in three components: one related to source position, another related to the room, and the last one depending on the position of the receiver.

Speakers

Rubén Fraile

Juan José Gómez-Alfageme

Elena Blanco-Martín

Juana María Gutiérrez-Arriola

Nicolás Sáenz Lechón

Saturday May 24, 2025 11:00am - 11:20am CEST
C1 ATM Studio Warsaw, Poland

Acoustics

Presentation Type Paper Presentation

11:20am CEST

Sparsity-based analysis of sound field diffuseness in rooms

Saturday May 24, 2025 11:20am - 11:40am CEST

Sound fields in enclosures comprise a combination of directional and diffuse components. The directional components include the direct path from the source and the early specular reflections. The diffuse part starts with the first early reflection and builds up gradually over time. An ideal diffuse field is achieved when incoherent reflections begin to arrive randomly from all directions. More specifically, a diffuse field is characterized by having uniform energy density (i.e., independence from measurement position) and an isotropic distribution (i.e. random directions of incidence), which results in zero net energy flow (i.e. the net time-averaged intensity is zero). Despite this broad definition, real diffuse sound fields typically exhibit directional characteristics owing to the geometry and the non-uniform absorptive properties of rooms.

Several models and data-driven metrics based on the definition of a diffuse field have been proposed to assess diffuseness. A widely used metric is the _mixing time_, which indicates the transition of the sound field from directional to diffuse and is known to depend, among other factors, on the room geometry.

The concept of mixing time is closely linked to normalized echo density (NEDP), a measure first used to estimate the mixing time in actual rooms (Abel and Huang, 2006), and later to assess the quality of artificial reverberators in terms of their capacity to produce a dense reverberant tail (De Sena et al., 2015). NEDP is calculated over room impulse responses measured with a pressure probe, evaluating how much the RIR deviates from a normal distribution. Another similar temporal/statistical measure, kurtosis, has been used to similar effect (Jeong, 2016). However, neither NEDP nor kurtosis provides insights into the directional attributes of diffuse fields. While both approaches rely on statistical reasoning rather than identifying individual reflections, another temporal approach uses matching pursuit to identify individual reflections (Defrance et al., 2009).

Another set of approaches focuses on the net energy flow aspect of the diffuse field, providing an energetic analysis framework either in the time domain (Del Galdo et al., 2012) or in the time-frequency domain (Ahonen and Pulkki, 2009). These approaches rely on calculating the time-averaged active intensity, either using intensity probes or first- and higher-order Ambisonics microphones, where a pseudo-intensity-based diffuseness is computed (Götz et al., 2015). The coherence of spherical harmonic decompositions of the sound field has also been used to estimate diffuseness (Epain and Jin, 2016). Beamforming methods have likewise been applied to assess the directional properties of sound fields and to illustrate how real diffuse fields deviate from the ideal (Gover et al., 2004).

We propose a spatio-spectro-temporal (SST) sound field analysis approach based on a sparse plane-wave decomposition of sound fields captured using a higher-order Ambisonics microphone. The proposed approach has the advantage of analyzing the progression of the sound field’s diffuseness in both temporal and spatial dimensions. Several derivative metrics are introduced to assess temporal, spectro-temporal, and spatio-temporal characteristics of the diffuse field, including sparsity, diversity, and isotropy. We define the room sparsity profile (RSP), room sparsity relief (RSR), and room sparsity profile diversity (RSPD) as temporal, spectro-temporal, and spatio-temporal measures of diffuse fields, respectively. The relationship of this new approach to existing diffuseness measures is discussed and supported by experimental comparisons using 4th- and 6th-order acoustic impulse responses, demonstrating the dependence of the new derivative measures on measurement position. We conclude by considering the limitations and applicability of the proposed approach.

Speakers

Ece Beren Genc

Mert Burkay Coteli

Huseyin Hacihabiboglu

Saturday May 24, 2025 11:20am - 11:40am CEST
C1 ATM Studio Warsaw, Poland

Acoustics

Presentation Type Paper Presentation

11:40am CEST

Evaluating room acoustic parameters using ambisonic technology: a case study of a medium-sized recording studio

Saturday May 24, 2025 11:40am - 12:00pm CEST

Ambisonic technology has recently gained popularity in room acoustic measurements due to its ability to capture both general and spatial characteristics of a sound field using a single microphone. On the other hand, conventional measurement techniques conducted in accordance with the ISO 3382-1 standard require multiple transducers, which results in more time-consuming procedure. This study presents a case study on the use of ambisonic technology to evaluate the room acoustic parameters of a medium-sized recording studio.
Two ambisonic microphones, a first-order Sennheiser Ambeo and a third-order Zylia ZM1-3E, were used to record spatial impulse responses in 30 combinations of sound source and receiver positions in the recording studio. Key acoustic parameters, including Reverberation Time (T30), Early Decay Time (EDT) and Clarity (C80), were calculated using spatial decomposition methods. The Interaural Cross-Correlation Coefficient (IACC) was derived from binaural impulse responses obtained using the MagLS binauralization method. The results were compared with conventional omnidirectional and binaural microphone measurements to assess the accuracy and advantages of ambisonic technology. The findings show that T30, EDT, C50 and IACC values measured with the use of ambisonic microphones are consistent with those obtained from conventional measurements.
This study demonstrates the effectiveness of ambisonic technology in room acoustic measurements by capturing a comprehensive set of parameters with a single microphone. Additionally, it enables the estimation of reflection vectors, offering further insights into spatial acoustics.

Speakers

Maciej Jasiński

Jan Żera

Saturday May 24, 2025 11:40am - 12:00pm CEST
C1 ATM Studio Warsaw, Poland

Acoustics

Presentation Type Paper Presentation

12:15pm CEST

Workshop: How to Build a World-Class Brand in 24 Hours

Saturday May 24, 2025 12:15pm - 1:15pm CEST

In this dynamic, hackathon-style session, participants will rapidly develop a world-class brand strategy for their company using cutting-edge AI tools and collaborative exercises. Attendees will leave with an actionable blueprint they can implement immediately in their businesses or projects.

Format: 90 minute session
Key Takeaways:
Master the essentials of brand strategy and its impact on content creation and sales
Engage in hands-on exercises to develop a brand strategy in real time
Learn how AI tools can accelerate brand positioning

Speakers

Joshua Ingram

Saturday May 24, 2025 12:15pm - 1:15pm CEST
C1 ATM Studio Warsaw, Poland

Analysis and synthesis of sound Audio and music information retrieval Audio effects Audio for virtual/augmented reality environments Audio perception Audio quality Electronic dance music Electronic instrument design & applications

Presentation Type Workshop

1:30pm CEST

Key Technology Briefing 5

Saturday May 24, 2025 1:30pm - 2:45pm CEST

Saturday May 24, 2025 1:30pm - 2:45pm CEST
C1 ATM Studio Warsaw, Poland