Loading…
Thursday, May 22
 

9:00am CEST

Erato recordings (title TBC)
Thursday May 22, 2025 9:00am - 10:00am CEST
Thursday May 22, 2025 9:00am - 10:00am CEST
C4 ATM Studio Warsaw, Poland

9:00am CEST

Free Online Course “Spatial Audio - Practical Master Guide”
Thursday May 22, 2025 9:00am - 10:00am CEST
“Spatial Audio - Practical Master Guide” is a free online course on spatial audio content creation. The target group are persons who have basic knowledge on audio production but are not necessarily dedicated experts in the underlying technologies and aesthetics. “Spatial Audio - Practical Master Guide” will be released on the Acoucou platform chapter-by-chapter all through Spring 2025. Some course content is already available as a preview.

The course comprises a variety of audio examples and interactive content that allow for the learners to develop their skills in a playful manner. The entire spectrum from psychoacoustics via the underlying technologies to delivery formats is covered. The course’s highlights are the 14 case studies and step-by-step guides that provide behind-the-scenes information. Many of the course components are self-sufficient so that they can be used in isolation or be integrated into other educational contexts.

The workshop on “Spatial Audio - Practical Master Guide” will provide an overview of the course contents, and we will explain the educational concepts that the course is based on. We will demonstrate the look and feel of the course on the Acoucou platform by demonstrating a set of representative examples from the courseware and provide the audience with the opportunity to experience it themselves. The workshop will wrap up with a discussion of the contexts in which the course contents may be useful besides self-study.

Course contents:
Chapter 1: Overview (introduction, history of spatial, evolution of aesthetics in spatial audio)
Chapter 2: Psychoacoustics (spatial hearing, perception of reverberation)
Chapter 3: Reproduction (loudspeaker arrays, headphones)
Chapter 4: Capture (microphone arrays)
Chapter 5: Ambisonics (capture, reproduction, editing of ambisonic content)
Chapter 6: Storing spatial audio content
Chapter 7: Delivery formats

Case studies: Dolby Atmos truck streaming, fulldome, ikosahedral loudspeaker, spatial audio sound installation, spatial audio at Friedrichstadt Palast, spatial audio in the health industry, live music performance with spatial audio, spatial audio in automotive

Step-by-step guides: setting up your spatial audio workstation, channel-based production for music, dolby atmos mix for cinema, ambisonics sound production for 360 film, build your own ambisonic microphone array, interactive spatial audio

Links:
https://spatial-audio.acoucou.org/
https://acoucou.org/
Thursday May 22, 2025 9:00am - 10:00am CEST
Hall F ATM Studio Warsaw, Poland

9:00am CEST

The Advance of UWB for High Quality and Low Latency audio
Thursday May 22, 2025 9:00am - 10:00am CEST
UWB as a RF protocol is being heavily used by handset manufacturers for device location applications. As a transport option, UWB offers tremendous possibilities for Professional audio use cases which also require low latency for real time requirements. These applications include digital wireless microphones and In Ear Monitors (IEM’s). These UWB enabled devices, when used for live performances, can deliver a total latency which is able to service Mic to Front of House Mixer and back to the performers IEM’s without a noticeable delay.

UWB is progressing as an audio standard within the AES and it's first iteration was in live performance applications. Issues relating to body blocking due to frequencies (6.5 / 8GHz) and also clocking challenges that could result in dropped packets have been addressed to ensure a stable, reliable link. This workshop will outline how UWB is capable of delivering a low latency link and providing up to 10MHz of data throughput for Hi Res (24/96) Linear PCM audio.

The progression of UWB for Audio is seeing the launch of high end devices which are being supported by several RF wireless vendors. This workshop will dive into the options open to device manufacturer who are considering UWB for their next generation product roadmaps.
Speakers
JM

Jonathan McClintock

Audio Codecs Ltd
Thursday May 22, 2025 9:00am - 10:00am CEST
C3 ATM Studio Warsaw, Poland

9:30am CEST

Correlation between middle and top layer loudspeaker signals and the listening range in 3D audio reproduction
Thursday May 22, 2025 9:30am - 9:50am CEST
In auditory spatial perception, horizontal sound image localization and a sense of spaciousness are based on level and time differences between the left and right ears as cues, and the degree of correlation between the left and right signals is thought to contribute to the sense of horizontal spaciousness, in particular [Hidaka1995, Zotter2013]. For the vertical image spread (VIS), spectral cues are necessary. The change in VIS due to the degree of correlation between the vertical and horizontal signals depends on the frequency response [Gribben2018]. This paper investigated the influence of different correlation values between the top and middle layers of loudspeaker signals within a 3D audio reproduction system on listening impressions through two experiments. The results of experiments using pink noise with different correlation values for the top and middle layers show that the lower the vertical correlation values are, the wider the listening range is, where the impression does not change from the central listening position. From the results of experiments using impulse responses obtained by setting up microphones in an actual concert hall, a tendency to perceive a sense of spaciousness at the off-center listening position was found when cardioid microphones were used for the top layer that were spaced apart from the middle layer. The polar pattern and height of the microphones may have resulted in lower correlation values in the vertical direction, thus widening the listening range of consistent spatial impression outside of the central listening position (i.e., “sweet spot”.)
Speakers
avatar for Toru Kamekawa

Toru Kamekawa

Professor, Tokyo University of the Arts
Toru Kamekawa: After graduating from the Kyushu Institute of Design in 1983, he joined the Japan Broadcasting Corporation (NHK) as a sound engineer. During that period, he gained his experience as a recording engineer, mostly in surround sound programs for HDTV.In 2002, he joined... Read More →
Thursday May 22, 2025 9:30am - 9:50am CEST
C2 ATM Studio Warsaw, Poland

9:30am CEST

Sound Synthesis 101: An Introduction To Sound Creation
Thursday May 22, 2025 9:30am - 11:00am CEST
Sound synthesis is a key part of modern music and audio production. Whether you are a producer, composer, or just curious about how electronic sounds are made, this workshop will break it down in a simple and practical way.

We will explore essential synthesis techniques like subtractive, additive, FM, wavetable, and granular synthesis. You will learn how different synthesis methods create and shape sound, and see them in action through live demonstrations using both hardware and virtual synthesizers, including emulators of the legendary studio equipment.

This session is designed for everyone — whether you are a total beginner or an experienced audio professional looking for fresh ideas. You will leave with a solid understanding of synthesis fundamentals and the confidence to start creating your own unique sounds. Join us for an interactive, hands-on introduction to the world of sound synthesis!
Speakers
Thursday May 22, 2025 9:30am - 11:00am CEST
C1 ATM Studio Warsaw, Poland

9:50am CEST

Plane wave creation in non-spherical loudspeaker arrays using radius formulation by the Lamé function
Thursday May 22, 2025 9:50am - 10:10am CEST
This paper proposes the method that plane wave field creation with spherical harmonics for a non-spherical array. In sound field control, there are physics-acoustic models and psycho-acoustic models. Some former are allowed in the location of each loudspeaker, but the sound have the differences between the auditory and the reproduction sound because phantom sources are constructed. The latter developed with wave equation under circle or spherical array conditions which are located strictly, and with high order Ambisonics (HOA) based on spherical harmonics which express only a single point. Therefore, we consider requiring the method which physically creates actual waveforms and provides flexibility in the shape of the loudspeaker array. In this paper, we focus on the Lamé function, changing its order as well as the shape of spatial figures, and propose formulating the distance between the center and each loudspeaker using the function in a polar expression. As the simulation experiment, in the inscribed region, the proposed plane wave can create the same waveform as the spherical one under high order Lamé function which is close to rectangular shape.
Speakers
TS

Tomohiro Sakaguchi

Doctoral student, Waseda University
Thursday May 22, 2025 9:50am - 10:10am CEST
C2 ATM Studio Warsaw, Poland

10:00am CEST

Hearing History: The Role of Acoustic Simulation in the Digital Reconstruction of the Wołpa Synagogue
Thursday May 22, 2025 10:00am - 11:30am CEST
This paper presents a case study on the auralization of the lost wooden synagogue in Wołpa, digitally reconstructed using a Heritage Building Information Modelling (HBIM) framework for virtual reality (VR) presentation. The study explores how acoustic simulation can aid in the preservation of intangible heritage, focusing on the synagogue’s unique acoustics. Using historical documentation, the synagogue was reconstructed with accurate geometric and material properties, and its acoustics were analyzed through high-fidelity ray-tracing simulations.
A key objective of this project is to recreate the Shema Israel ritual, incorporating a historical recording of the rabbi’s prayers. To enable interactive exploration, real-time auralization techniques were optimized to balance computational efficiency and perceptual authenticity, aiming to overcome the trade-offs between simplified VR audio models and physically accurate simulations. This research underscores the transformative potential of immersive technologies in reviving lost heritage, offering a scalable, multi-sensory approach to preserving sacred soundscapes and ritual experiences.
Thursday May 22, 2025 10:00am - 11:30am CEST
Hall F ATM Studio Warsaw, Poland

10:00am CEST

Real-Time Performer Switching in Chamber Music
Thursday May 22, 2025 10:00am - 11:30am CEST
The article explores the innovative concept of interactive music, where both creators and listeners can actively shape the structure and sound of a musical piece in real-time. Traditionally, music is passively consumed, but interactivity introduces a new dimension, allowing for creative participation and raising questions about authorship and the listener's role. The project "Sound Permutation: A Real-Time Interactive Musical Experiment" aims to create a unique audio-visual experience by enabling listeners to choose performers for a chamber music piece in semi-real-time. Two well-known compositions, Edward Elgar's "Salut d’Amour" and Camille Saint-Saëns' "Le Cygne," were recorded by three cellists and three pianists in all possible combinations. This setup allows listeners to seamlessly switch between performers' parts, offering a novel musical experience that highlights the impact of individual musicians on the perception of the piece.

The project focuses on chamber music, particularly the piano-cello duet, and utilizes advanced recording technology to ensure high-quality audio and video. The interactive system, developed using JavaScript allows for smooth video streaming and performer switching. The user interface is designed to be intuitive, featuring options for selecting performers and camera views. The system's optimization ensures minimal disruption during transitions, providing a cohesive musical experience. This project represents a significant step towards making interactive music more accessible, showcasing the potential of technology in shaping new forms of artistic engagement and participation.
Speakers
avatar for Pawel Malecki

Pawel Malecki

Profesor, AGH University of Krakow
Thursday May 22, 2025 10:00am - 11:30am CEST
Hall F ATM Studio Warsaw, Poland

10:00am CEST

The benefits, tradeoffs, economics and tradeoffs of standard and proprietary digital audio networks in DSP systems
Thursday May 22, 2025 10:00am - 11:30am CEST
In the field of digital audio signal processing (DSP) systems, the choice between standard and proprietary digital audio networks (DANs) can significantly impact both functionality and performance. This abstract aims to explore the benefits, tradeoffs, and economic implications of these two approaches, providing a comprehensive comparison to aid in decision-making processes for audio professionals and system designers. The abstract emphasizes key benefits of A2B, AOIP and older proprietary currently adopted.

Conclusion
The choice between standard and proprietary digital audio networks in audio DSP systems involves a careful consideration of benefits, tradeoffs, and economic implications. Standards-based systems provide interoperability and cost-effectiveness, while proprietary solutions offer optimized performance and innovative features. Understanding these factors can guide audio professionals and system designers in making informed decisions that align with their specific needs and long-term goals.
Speakers
avatar for Miguel Chavez

Miguel Chavez

Strategic Marketing ProAudio, Analog Devices
Electrical and Mechanical Engineer Bachelor Degree from Universidad Panamericana in Mexico City. Master in Science in Music Engineering from University of Miami.EMBA from Boston UniversityWorked at Analog Devices developing DSP Software and Algorithms ( SigmaStudio ) for 17 years... Read More →
Thursday May 22, 2025 10:00am - 11:30am CEST
Hall F ATM Studio Warsaw, Poland

10:00am CEST

The Sound Map of Białystok − From monophonic to immersive audio repository of urban soundscapes
Thursday May 22, 2025 10:00am - 11:30am CEST
This paper presents an ongoing project that aims to document the urban soundscapes of the Polish city of Białystok. It describes the progress made so far, including the selection of sonic landmarks, the process of acquiring the audio recordings, and the design of the unique graphic user interface featuring original drawings. Furthermore, it elaborates on the ongoing efforts to extend the project beyond the scope of a typical urban soundscape repository. In the present phase of the project, in addition to monophonic recordings, audio excerpts are acquired in binaural and Ambisonic sound formats, providing listeners with an immersive experience. Moreover, state-of-the-art machine-learning algorithms are applied to analyze gathered audio recordings in terms of their content and spatial characteristics, ultimately providing prospective users of the sound map with some form of automatic audio tagging functionality.
Thursday May 22, 2025 10:00am - 11:30am CEST
Hall F ATM Studio Warsaw, Poland

10:10am CEST

Recursive solution to the Broadband Acoustic Contrast Control with Pressure Matching algorithm
Thursday May 22, 2025 10:10am - 10:30am CEST
This paper presents a recursive solution to the Broadband Acoustic Contrast Control with Pressure Matching (BACC-PM) algorithm, designed to optimize sound zones systems efficiently in the time domain. Traditional frequency-domain algorithms, while computationally less demanding, often result in non-causal filters with increased pre-ringing, making time-domain approaches preferable for certain applications. However, time-domain solutions typically suffer from high computational costs as a result of the inversion of large convolution matrices.
To address these challenges, this study introduces a method based on gradient descent and conjugate gradient descent techniques. By exploiting recursive calculations, the proposed approach significantly reduces computational time compared to direct inversion.
Theoretical foundations, simulation setups, and performance metrics are detailed, showcasing the efficiency of the algorithm in achieving high acoustic contrast and low reproduction errors with reduced computational effort. Simulations in a controlled environment demonstrate the advantages of the method.
Speakers
avatar for Manuel Melon

Manuel Melon

Professor, LAUM / LE MANS Université
Thursday May 22, 2025 10:10am - 10:30am CEST
C2 ATM Studio Warsaw, Poland

10:15am CEST

Logarithmic frequency resolution filter design with applications to loudspeaker and room equalization
Thursday May 22, 2025 10:15am - 11:15am CEST
Digital filters are often used to model or equalize acoustic or electroacoustic transfer functions. Applications include headphone, loudspeaker, and room equalization, or modeling the radiation of musical instruments for sound synthesis. As the final judge of quality is the human ear, filter design should take into account the quasi-logarithmic frequency resolution of the auditory system. This tutorial presents various approaches for achieving this goal, including warped FIR and IIR, Kautz, and fixed-pole parallel filters, and discusses their differences and
similarities. Examples will include loudspeaker and room equalization applications, and the equalization of a spherical loudspeaker array. The effect of quantization noise arising in real-world applications will also be considered.
Speakers
Thursday May 22, 2025 10:15am - 11:15am CEST
C3 ATM Studio Warsaw, Poland

10:30am CEST

GSound-SIR: A Spatial Impulse Response Ray-Tracing and High-order Ambisonic Auralization Python Toolkit
Thursday May 22, 2025 10:30am - 10:50am CEST
Accurate and efficient simulation of room impulse responses is crucial for spatial audio applications. However, existing acoustic ray-tracing tools often operate as black boxes and only output impulse responses (IRs), providing limited access to intermediate data or spatial fidelity. To address those problems, this paper presents GSound-SIR, a novel Python-based toolkit for room acoustics simulation that addresses these limitations. The contribution of this paper includes the follows. First, GSound-SIR provides direct access to up to millions of raw ray data points from simulations, enabling in-depth analysis of sound propagation paths that was not possible with previous solutions. Second, we introduce a tool to convert acoustic rays into high-order Ambisonic impulse response synthesis, capturing spatial audio cues with greater fidelity than standard techniques. Third, to enhance efficiency, the toolkit implements an energy-based filtering algorithm and can export only the top-X or top-X-% rays. Fourth, we propose to store the simulation results into Parquet formats, facilitating fast data I/O and seamless integration with data analysis workflows. Together, these features make GSound-SIR an advanced, efficient, and modern foundation for room acoustics research, providing researchers and developers with a powerful new tool for spatial audio exploration.
Thursday May 22, 2025 10:30am - 10:50am CEST
C2 ATM Studio Warsaw, Poland

10:45am CEST

Sound Aesthetics for Impressive 3D Audio Productions
Thursday May 22, 2025 10:45am - 11:45am CEST
In today's era, 3D audio enables us to craft sounds akin to how composers have created sonic landscapes with orchestras for centuries. We achieve significantly higher spatial precision than conventional stereo thanks to advanced loudspeaker setups like 7.1.4 and 9.1.6. This means that sounds become sharper, more plastic, and thus plausible – like the transition from HD to 8K in the visual realm, yielding an image virtually indistinguishable from looking out of a window.

In the first part of his contribution, Lasse Nipkow introduces a specialized microphone technique that captures instruments in space as if the musicians were right in front of us. This forms the basis for capturing the unique timbres of the instruments while ensuring that the sounds remain as pure as possible for the mix.

In the second part of his contribution, Nipkow elucidates the parallels between classical orchestras and modern pop or singer-songwriter productions. He demonstrates how composers of yesteryear shaped their sounds for concert performances – like our studio practices today with double tracking. Using sound examples, he illustrates how sounds can establish an auditory connection between loudspeakers, thus creating a sound body distinct from individual instruments that stand out solitarily.
Speakers
avatar for Lasse Nipkow

Lasse Nipkow

CEO, Silent Work LLC
Since 2010, Lasse Nipkow has been a renowned keynote speaker in the field of 3D audio music production. His expertise spans from seminars to conferences, both online and offline, and has gained significant popularity. As one of the leading experts in Europe, he provides comprehensive... Read More →
Thursday May 22, 2025 10:45am - 11:45am CEST
C4 ATM Studio Warsaw, Poland

11:00am CEST

Ambisonic Spatial Decomposition Method with salient / diffuse separation
Thursday May 22, 2025 11:00am - 11:20am CEST
This paper proposes a new algorithm for enhancing the spatial resolution of measured first-order Ambisonics room impulse responses (FOA RIRs). It applies a separation of the RIR into a salient stream (direct sound and reflections) and a diffuse stream to treat them differently: The salient stream is enhanced using the Ambisonic Spatial Decomposition Method (ASDM) with a single direction of arrival (DOA) per sample of the RIR, while the diffuse stream is enhanced by 4-directional (4D-)ASDM with 4 DOAs at the same time. Listening experiments comparing the new Salient/Diffuse S/D-ASDM to ASDM, 4D-ASDM, and the original FOA RIR reveal the best results for the new algorithm in both spatial clarity and absence of artifacts, especially for its variant, which keeps the DOA constant within each salient event block.
Speakers
LG

Lukas Gölles

University of Music and Performing Arts Graz - Institute of Electronic Music and Acoustics
Thursday May 22, 2025 11:00am - 11:20am CEST
C2 ATM Studio Warsaw, Poland

11:20am CEST

Towards a standard listener-independent HRTF to facilitate long-term adaptation
Thursday May 22, 2025 11:20am - 11:50am CEST
Head-related transfer functions (HRTFs) are used in auditory applications for spatializing virtual sound sources. Listener-specific HRTFs, which aim at mimicking the filtering of the head, torso and pinnae of a specific listener, improve the perceived quality of virtual sound compared to using non-individualized HRTFs. However, using listener-specific HRTFs may not be accessible for everyone. Here, we propose as an alternative to take advantage of the adaptation abilities of human listeners to a new set of HRTFs. We claim that agreeing upon a single listener-independent set of HRTFs has beneficial effects for long-term adaptation compared to using several, potentially severely different HRTFs. Thus, the Non-individual Ear MOdel (NEMO) initiative is a first step towards a standardized listener-independent set of HRTFs to be used across applications as an alternative to individualization. A prototype, NEMObeta, is presented to explicitly encourage external feedback from the spatial audio community, and to agree on a complete list of requirements for the future HRTF selection.
Speakers
avatar for Katharina Pollack

Katharina Pollack

PhD student in spatial audio, Acoustics Research Institute Vienna & Imperial College London
Katharina Pollack studied electrical engineering audio engineering in Graz, both at the Technical University and the University of Music and Performing Arts in Graz and is doing her PhD at the Acoustics Research Institute in Vienna in the field of spatial hearing. Her main research... Read More →
avatar for Nils Meyer-Kahlen

Nils Meyer-Kahlen

Aalto University
Thursday May 22, 2025 11:20am - 11:50am CEST
C2 ATM Studio Warsaw, Poland

11:25am CEST

Don't run! It's just a synthesizer
Thursday May 22, 2025 11:25am - 12:15pm CEST
Everybody knows the existence of music with electronic elements. Most of us are aware of the synthesis standing behind it. But the moment I start asking about what's under the hood, the majority of the audience start to run for their lifes. Which is rather sad for me, because learning synthesis could be among the greatest journeys you could take in your life. And I want to back those words up on my workshop.

Let's talk and see what exactly is synthesis, and what it is not. Let's talk about building blocks of basic substractive setup. We will track all the knobs, buttons and sliders, down to every single cable under the front panel. Simply to see which "valve" and "motor" is controlled by which knob. And how does it sounds.

I also want to make you feel safe about modular setups, because when you understand the basic blocks - you understand the modular synthesis. Just like building from bricks!
Thursday May 22, 2025 11:25am - 12:15pm CEST
C1 ATM Studio Warsaw, Poland

11:45am CEST

How Does It Sound Now? The Evolution of Audio
Thursday May 22, 2025 11:45am - 12:45pm CEST
One day Chet Atkins was playing guitar when a woman approached him. She said, "That guitar sounds beautiful". Chet immediately quit playing. Staring her in the eyes he asked, "How does it sound now?"
The quality of the sound in Chet’s case clearly rested with the player, not the instrument, and the quality of our product ultimately lies with us as engineers and producers, not with the gear we use. The dual significance of this question, “How does it sound now”, informs our discussion, since it addresses both the engineer as the driver and the changes we have seen and heard as our business and methodology have evolved through the decades.
Let’s start by exploring the methodology employed by the most successful among us when confronted with new and evolving technology. How do we retain quality and continue to create a product that conforms to our own high standards? This may lead to other conversations about the musicians we work with, the consumers we serve, and the differences and similarities between their standards and our own. How high should your standards be? How should it sound now? How should it sound tomorrow?
Speakers
Thursday May 22, 2025 11:45am - 12:45pm CEST
C3 ATM Studio Warsaw, Poland

11:45am CEST

Best practices for wireless audio in live production
Thursday May 22, 2025 11:45am - 12:45pm CEST
Wireless audio, both mics and in-ear-monitors, has become essential in many live productions of music and theatre, but it is often fraught with uneasiness and uncertainty. The panel of presenters will draw on their varied experience and knowledge to show how practitioners can use best engineering practices to ensure reliability and performance of their wireless mic and in-ear-monitor systems.
Speakers
avatar for Bob Lee

Bob Lee

Applications Engineer / Trainer, RF Venue, Inc.
I'm a fellow of the AES, an RF and electronics geek, and live audio specialist, especially in both amateur and professional theater. My résumé includes Senhheiser, ARRL, and a 27-year-long tenure at QSC. Now I help live audio practitioners up their wireless mic and IEM game.I play... Read More →
Thursday May 22, 2025 11:45am - 12:45pm CEST
Hall F ATM Studio Warsaw, Poland

11:50am CEST

Real-Time Auralization Pipeline for First-Person Vocal Interaction in Audio-Visual Virtual Environments
Thursday May 22, 2025 11:50am - 12:10pm CEST
Multimodal research and applications are becoming more commonplace as Virtual Reality (VR) technology integrates different sensory feedback, enabling the recreation of real spaces in an audio-visual context. Within VR experiences, numerous applications rely on the user’s voice as a key element of interaction, including music performances and public speaking applications. Self-perception of our voice plays a crucial role in vocal production. When singing or speaking, our voice interacts with the acoustic properties of the environment, shaping the adjustment of vocal parameters in response to the perceived characteristics of the space.

This technical report presents a real-time auralization pipeline that leverages three-dimensional Spatial Impulse Responses (SIRs) for multimodal research applications in VR requiring first-person vocal interaction. It describes the impulse response creation and rendering workflow, the audio-visual integration, and addresses latency and computational considerations. The system enables users to explore acoustic spaces from various positions and orientations within a predefined area, supporting three and five Degrees of Freedom (3Dof and 5DoF) in audio-visual multimodal perception for both research and creative applications in VR.

The design of this pipeline arises from the limitations of existing audio tools and spatializers, particularly regarding signal latency, and the lack of SIRs captured from a first-person perspective and in multiple adjacent distributions to enable translational rendering. By addressing these gaps, the system enables real-time auralization of self-generated vocal feedback.
Speakers
avatar for Enda Bates

Enda Bates

Assistant Prof., Trinity College Dublin
I'm interested in spatial audio, spatial music, and psychoacoustics. I'm the deputy director of the Music & Media Technologies M.Phil. programme in Trinity College Dublin, and a researcher with the ADAPT centre. At this convention I'm presenting a paper on a Ambisonic Decoder Test... Read More →
Thursday May 22, 2025 11:50am - 12:10pm CEST
C2 ATM Studio Warsaw, Poland

12:00pm CEST

Advanced Spatial Recording Techniques for Chamber Orthodox-Choir Music in Monumental Acoustics
Thursday May 22, 2025 12:00pm - 12:25pm CEST
This tutorial proposal presents a comprehensive exploration of spatial audio recording methodologies applied to the unique challenges of documenting Eastern Orthodox liturgical music in monumental acoustic environments. Centered on a recent project at the Church of the Assumption of the Blessed Virgin Mary and St. Joseph in Warsaw, Poland, the session dissects the technical and artistic decisions behind capturing the Męski Zespół Muzyki Cerkiewnej (Male Ensemble of Orthodox Music) “Katapetasma.” The repertoire—spanning 16th-century monodic irmologions, Baroque-era folk chant collections, and contemporary compositions—demanded innovative approaches to balance clarity, spatial immersion, and the venue’s 5-second reverberation time.
Attendees will gain insight into hybrid microphone techniques tailored for immersive formats (Dolby Atmos, Ambisonics) and stereo reproduction. The discussion focuses on the strategic deployment of a Decca Tree core augmented by an AMBEO array, height channels, a Faulkner Pair for mid-depth detail, ambient side arrays, and spaced AB ambient pairs to capture the room’s decay. Particular emphasis is placed on reconciling close-miking strategies (essential for textual clarity in melismatic chants) with distant arrays that preserve the sacred space’s acoustic identity. The tutorial demonstrates how microphone placement—addressing both the choir’s position and the building’s 19th-century vaulted architecture—became critical in managing comb filtering and low-frequency buildup.
Practical workflow considerations include:
Real-time monitoring of spatial imaging through multiple microphone and loudspeaker configurations
Phase coherence management between spot microphones and room arrays
Post-production techniques for maintaining vocal intimacy within vast reverberant fields
Case studies compare results from the Decca/AMBEO hybrid approach against traditional spaced omni configurations, highlighting tradeoffs between localization precision and spatial envelopment. The session also addresses the psychoacoustic challenges of recording small choral ensembles in reverberant spaces, where transient articulation must coexist with diffuse sustain.
Speakers
avatar for Pawel Malecki

Pawel Malecki

Profesor, AGH University of Krakow
Thursday May 22, 2025 12:00pm - 12:25pm CEST
C4 ATM Studio Warsaw, Poland

12:10pm CEST

On the Design of Binaural Rendering Library for IAMF Immersive Audio Container
Thursday May 22, 2025 12:10pm - 12:30pm CEST
Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications.
Speakers
avatar for Jan Skoglund

Jan Skoglund

Google
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
Thursday May 22, 2025 12:10pm - 12:30pm CEST
C2 ATM Studio Warsaw, Poland

2:30pm CEST

The Ins and Outs of Microphones
Thursday May 22, 2025 2:30pm - 3:30pm CEST
Microphones are the very first link in the recording chain, so it’s important to understand them to use them effectively. This presentation will explain the differences between different types of microphones; explain polar-patterns and directivity, proximity effect relative recording distances and a little about room acoustics. Many of these “golden nuggets” helped me greatly when I first understood them and I hope they will help you too.

We will look at the different microphone types – dynamic, moving-coil, ribbon and capacitor microphones, as well as boundary and line-array microphones. We will look at polar patterns and how they are derived. We will look at relative recording distances and a little about understanding room acoustics. All to help you to choose the best microphone for what you want to do and how best to use it.
Speakers
Thursday May 22, 2025 2:30pm - 3:30pm CEST
C3 ATM Studio Warsaw, Poland

2:30pm CEST

Tutorial: Capturing Your Prosumers
Thursday May 22, 2025 2:30pm - 3:30pm CEST
Tutorial: Capturing Your Prosumers
This session breaks down how top brands like Samsung, Apple, and Slack engage professional and semi-professional buyers. Attendees will gain concrete strategies and psychological insights they can use to boost customer retention and revenue.

Format: 1-Hour Session
Key Takeaways:
- Understand the psychology behind purchasing decisions of prosumers, drawing on our access to insights from over 300 million global buyers
- Explore proven strategies to increase engagement and revenue
- Gain actionable frameworks for immediate implementation
Speakers
Thursday May 22, 2025 2:30pm - 3:30pm CEST
C1/2 ATM Studio Warsaw, Poland

2:30pm CEST

An in-situ perceptual evaluation of spatial audio in an automotive environment
Thursday May 22, 2025 2:30pm - 5:00pm CEST
Speakers
avatar for Bogdan Bacila

Bogdan Bacila

Postdoc, Institute of Sound and Vibration Research - University of Southampton
avatar for Filippo Fazi

Filippo Fazi

University of Southampton
Thursday May 22, 2025 2:30pm - 5:00pm CEST
Hall F ATM Studio Warsaw, Poland

2:30pm CEST

Comparing Artificially Created Acoustic Environments to Real Space Responses: Integrating Objective Metrics and Subjective Perceptual Listening Tests
Thursday May 22, 2025 2:30pm - 5:00pm CEST
This study evaluates the effectiveness of artificial reverberation algorithms that are used to create simulated acoustic environments by comparing them to the acoustic response of the real spaces. A mixed-methods approach, integrating objective and subjective measures, was employed to assess both the accuracy and perceptual quality of simulated acoustics. Real-world spaces, within a research project…, were selected for their varying sizes, functions, and acoustical properties. Objective acoustic measurements—such as Room Impulse Response (RIR), and extracted features i.e. Reverberation Time (RT60), Early Decay Time (EDT), Clarity index (C50, C80), and Definition (D50)—were conducted to establish baseline profiles. Simulated environments were created to replicate real-world conditions, incorporating source-receiver configurations, room geometries, and/or material properties. Objective metrics were extracted from these simulations for comparison with real-world data. After applying the artificial reverberation algorithm, the same objective measurements were re-recorded to assess its impact. Subjective listening tests were also conducted, with a diverse panel of listeners rating the perceived clarity, intelligibility, comfort, and overall sound quality of both real and simulated spaces, using a double-blind procedure to mitigate bias. Statistical analyses, including paired t-tests and correlation analysis, were performed to assess the relationship between objective and subjective evaluations. This approach provides a comprehensive framework for evaluating the algorithm’s ability to enhance simulated acoustics and align them with real-world environments.
Speakers
RK

Rigas Kotsakis

Aristotle University of Thessaloniki
avatar for Nikolaos Vryzas

Nikolaos Vryzas

Aristotle University Thessaloniki
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
Thursday May 22, 2025 2:30pm - 5:00pm CEST
Hall F ATM Studio Warsaw, Poland

2:30pm CEST

Dynamic Diffuse Signal Processing (DiSP) as a Method of Decorrelating Early Reflections In Automobile Audio Systems
Thursday May 22, 2025 2:30pm - 5:00pm CEST
Automotive audio systems operate in highly reflective and acoustically challenging environments that differ significantly from optimized listening spaces such as concert halls or home theaters. The compact and enclosed nature of car cabins, combined with the presence of reflective surfaces—including the dashboard, windshield, and window, creates strong early reflections that interfere with the direct sound from loudspeakers. These reflections result in coherent interference, comb filtering, and position-dependent variations in frequency response, leading to inconsistent tonal balance, reduced speech intelligibility, and compromised stereo imaging and spatial localization. Traditional approaches, such as equalization and time alignment, attempt to compensate for these acoustic artifacts but do not effectively address coherence issues arising from coherent early reflections.
To mitigate these challenges, this study explores Dynamic Diffuse Signal Processing (DiSP) as an alternative solution for reducing early reflection coherence within automotive environments. DiSP is a convolution based signal processing technique that when implemented effectively, decorrelates coherent signals them while remaining perceptually identical. While this method has been successfully studied in sound reinforcement and multi-speaker environments, its application in automotive audio has not been extensively studied.
This research investigates the effectiveness of DiSP by analyzing pre- and post-DiSP impulse responses and frequency response variations at multiple listening positions. We assess its effectiveness in mitigating phase interference, reducing comb filtering. Experimental results indicate that DiSP significantly improves the uniformity of sound distribution, reducing spectral deviations across seating positions and minimizing unwanted artifacts caused by early reflections. These findings suggest that DiSP can serve as a powerful tool for optimizing in-car audio reproduction, offering a scalable and computationally efficient approach to improving listener experience in modern automotive sound systems.
Speakers
TS

Tommy Spurgeon

Physics Student & Undergraduate Researcher, University of South Carolina
Thursday May 22, 2025 2:30pm - 5:00pm CEST
Hall F ATM Studio Warsaw, Poland

2:30pm CEST

Perceptual Evaluation in varying Levels of Acoustic Detail in Multimodal Virtual Reality
Thursday May 22, 2025 2:30pm - 5:00pm CEST
#N/A
Speakers
HZ

Haowen Zhao

Univeristy of York
I am now working as an audio engineer with my research into 6 Degrees-of-Freedom (6DoF) audio for Virtual Reality (VR); this includes hybrid acoustic modelling methods for real-time calculation. I am currently looking at perceptual differences in different acoustic rendering methods... Read More →
DM

Damian Murphy

University of York
Thursday May 22, 2025 2:30pm - 5:00pm CEST
Hall F ATM Studio Warsaw, Poland

2:45pm CEST

Students Welcome
Thursday May 22, 2025 2:45pm - 3:30pm CEST
Thursday May 22, 2025 2:45pm - 3:30pm CEST
Hall F ATM Studio Warsaw, Poland

3:45pm CEST

Caudio Sponsored Session
Thursday May 22, 2025 3:45pm - 4:45pm CEST
Thursday May 22, 2025 3:45pm - 4:45pm CEST
Hall F ATM Studio Warsaw, Poland

4:00pm CEST

Objects and Layers: Creating a Sense of Depth in Atmos recordings
Thursday May 22, 2025 4:00pm - 4:25pm CEST
This presentation focusses on side and rear channels in recordings in Dolby Atmos. At present, there is no standardised placement for side or rear speakers. This can result in poor localisation in a major portion of the listening area. Sometimes, side speakers are at 90° off the centre axis, sometimes up to 110° off axis. Similarly, rear speakers can be anywhere 120°-135° degrees off axis; in cinemas those can be located directly behind the listener(s). However, an Atmos speaker bed assumes a fixed placement of these side and rear speakers, resulting in inconsistent imaging. Additionally, placing side and rear speakers further off-axis results in a larger gap between them and the front speakers.

These inconsistencies can be minimised by placing these objects at specific virtual locations, whilst avoiding the fixed speaker bed. This ensures a listening experience which represents better what the mix engineer intended. Additionally, reverb feeds can also be sent as objects, to create an illusion further depth. Finally, these additional objects can be fine-tuned for binaural rendering by use of Near/Mid/Far controls.

Mr. Bowles will demonstrate these techniques in an immersive playback session.
Speakers
avatar for David Bowles

David Bowles

Owner, Swineshead Productions, LLC
David v.R Bowles formed Swineshead Productions, LLC as a classical recording production company in 1995. His recordings have been GRAMMY- and JUNO-nominated and critically acclaimed worldwide. His releases in 3D Dolby Atmos can be found on Avie, OutHere Music (Delos) and Navona labels.Mr... Read More →
Thursday May 22, 2025 4:00pm - 4:25pm CEST
C4 ATM Studio Warsaw, Poland

4:00pm CEST

Student Recording Competition 1
Thursday May 22, 2025 4:00pm - 5:00pm CEST
Thursday May 22, 2025 4:00pm - 5:00pm CEST
C3 ATM Studio Warsaw, Poland

4:00pm CEST

Key Technology Briefings
Thursday May 22, 2025 4:00pm - 6:00pm CEST
Thursday May 22, 2025 4:00pm - 6:00pm CEST
C2 ATM Studio Warsaw, Poland

4:30pm CEST

The Records of Gaitan: Restoring the long silenced voice of an important political figure in Colombian History.
Thursday May 22, 2025 4:30pm - 5:30pm CEST
Thursday May 22, 2025 4:30pm - 5:30pm CEST
C4 ATM Studio Warsaw, Poland

4:30pm CEST

Automated soundstage tuning in cars
Thursday May 22, 2025 4:30pm - 7:00pm CEST
#N/A
Thursday May 22, 2025 4:30pm - 7:00pm CEST
Hall F ATM Studio Warsaw, Poland

5:00pm CEST

Polish Audio Manufacturers TBC
Thursday May 22, 2025 5:00pm - 6:00pm CEST
Thursday May 22, 2025 5:00pm - 6:00pm CEST
Hall F ATM Studio Warsaw, Poland

5:00pm CEST

Getting the most out of your immersive production
Thursday May 22, 2025 5:00pm - 6:00pm CEST
The field of audio production is always evolving. Now with immersive audio formats becoming more and more prominent, we should have a closer look at what possibilities come with it from a technical but most importantly from an artistic and musical standpoint.
In our Workshop, "Unlocking New Dimensions: Producing Music in Immersive Audio," we demonstrate how immersive audio formats can bring an artist's vision to life and how the storytelling in the music benefits from them.
In order to truly change the way people listen to music and provide an immersive experience, we must transform how we write and produce music, using immersive formats not just as a technical advancement but as a medium to create new art.
In this session, we will explore the entire production process, from recording to the final mix, and master with a focus on how one can create a dynamic and engaging listening experience with immersive formats like Dolby Atmos. We believe that immersive audio is more than just a technical upgrade—it's a new creative canvas. Our goal is to show how, by fully leveraging a format like Dolby Atmos, artists and producers can create soundscapes that envelop the listener and add new dimensions to the storytelling of music.

Philosophy

Artists often feel disconnected from the immersive production process. They rarely can give input on how their music is mixed in this format, leading to results that may not fully align with their artistic vision. At High Tide, we prioritize artist involvement, ensuring they are an integral part of the process. We believe that their input is crucial for creating an immersive experience that truly represents their vision. We will share insights and examples from our collaborations with artists like Amistat, an acoustic folk duo, and Tinush, an electronic music producer known for his attention to detail. These case studies will illustrate how our method fosters creativity and produces superior immersive audio experiences.

New workflows need new tools

A significant pain point in current immersive productions is the tendency to use only a few stems, which often limits the immersive potential. This often happens because the process of exporting individual tracks and preparing a mixing session can be time-consuming and labor-intensive. We will address these challenges in our presentation. We have developed innovative scripts and workflows that streamline this process, allowing us to work with all available tracks without the typical hassle. This approach not only enhances the quality of the final mix but also retains the intricate details and nuances of the original recordings.
Our workshop is designed to be interactive, with opportunities for attendees to ask questions throughout. We will provide real-world insights into our ProTools sessions, giving participants a detailed look at our Dolby Atmos mixing process. By walking through the entire workflow, from recording with Dolby Atmos in mind to the final mix, attendees will gain a comprehensive understanding of the steps involved and the benefits of this approach to create an engaging and immersive listening experience.
Speakers
avatar for Lennart Damann

Lennart Damann

Founder / Engineer, High Tide - Immersive Audio
avatar for Benedikt Ernst

Benedikt Ernst

High Tide - Immersive Audio
Thursday May 22, 2025 5:00pm - 6:00pm CEST
C4 ATM Studio Warsaw, Poland

5:15pm CEST

High Pass everything! or not?
Thursday May 22, 2025 5:15pm - 6:00pm CEST
High pass Filters (HPF) in music production, do's and don'ts
This presentation aims to bring a thorough insight on the use of high pass filters in music production. WHich type, slope, and frequency settings could be more desirable for a given source or application?
Are HPF in microphones and preamps the same? Do they serve the same purpose? is there any rule on when to use one, the other or both? furthermore, HPF is also used extensively in the mixing and processing of audio signals. HPF is commonly applied in the sidechain signal on dynamic processors (EG: buss compressors) and of course in all multiband processing. what are the benefits of this practice?
Live sound reinforcement, different approaches on the use of HPF.
Different genres call for different production techniques, understanding the basics of this simple albeit important signal filtering process helps in the conscious implementation.
Speakers
avatar for Cesar Lamschtein
Thursday May 22, 2025 5:15pm - 6:00pm CEST
C3 ATM Studio Warsaw, Poland

6:00pm CEST

Heyser Lecture
Thursday May 22, 2025 6:00pm - 7:00pm CEST
Embarking on my professional journey as a young DSP engineer at Fraunhofer IIS in Erlangen, Germany, in 1989, I quickly encountered a profound insight that would shape my entire career in audio: audio is not merely data like any other set of numbers; its significance lies in how it sounds to us as human listeners. The sonic quality of audio signals cannot be captured by simple metrics like ‘signal-to-noise ratio.’ Instead, the true goal of any skilled audio engineer should be to enhance quality in ways that are genuinely perceptible through listening, rather than relying solely on mathematical diagnostics.



This foundational concept has been a catalyst for innovation throughout my career, from pioneering popular perceptual audio codecs like MP3 and AAC to exploring audio for VR/AR and AI-driven audio coding.



Join me in this lecture as I share my personal 36-year research journey, that led me to believe that in the world of media, it’s all about perception!
Thursday May 22, 2025 6:00pm - 7:00pm CEST
Hall F ATM Studio Warsaw, Poland
 
Friday, May 23
 

9:00am CEST

How to create and use audio for accessible video games?
Friday May 23, 2025 9:00am - 10:00am CEST
Sound is one of the most powerful tools for accessibility in video games, enabling players with visual impairments or cognitive disabilities to navigate, interact, and fully engage with the game world. This panel will explore how sound engineers can leverage audio design to enhance accessibility, making games more inclusive without compromising artistic intent. Experts from different areas of game development will discuss practical approaches, tools, and case studies that showcase how audio can bridge gaps in accessibility.

Discussion Topics:

• Why is sound crucial for accessibility in video games? Audio cues, spatial sound, and adaptive music can replace or complement visual elements, guiding players with disabilities through complex environments and interactions.
• Designing effective spatial audio for navigation and interaction. Using 3D audio and binaural rendering to provide players with intuitive sound-based navigation, enhancing orientation and gameplay flow for blind or visually impaired users.
• Audio feedback and sonification as key accessibility tools. Implementing detailed auditory feedback for in-game actions, menu navigation, and contextual cues to improve usability and player experience.
• Case studies of games with exemplary accessible audio design. Examining how games like The Last of Us Part II, BROK: The InvestiGator, and other titles have successfully integrated sound-based accessibility features.
• Tools and middleware solutions for accessible sound design (example: InclusivityForge). Showcasing how game engines and plugins such as InclusivityForge can streamline the implementation of accessibility-focused audio solutions.
• Challenges in designing accessible game audio and overcoming them. Addressing common technical and creative challenges when designing inclusive audio experiences, including balancing accessibility with immersive design.
• Future trends in accessibility-driven audio design. Exploring how AI, procedural sound, and new hardware technologies can push the boundaries of accessibility in interactive audio environments.

Panel Guests:

• Dr Joanna Pigulak - accessibility expert in games, researcher specializing in game audio accessibility, assistant professor at the Institute of Film, Media, and Audiovisual Arts at UAM.
• Tomasz Tworek - accessibility consultant, blind gamer, and audio design collaborator specializing in improving audio cues and sonification in video games.
• Dr Tomasz Żernicki - sound engineer, creator of accessibility-focused audio technologies for games, and founder of InclusivityForge.

Target Audience:

• Sound engineers and game audio designers looking to implement accessibility features in their projects.
• Game developers interested in leveraging audio as a tool for accessibility.
• UX designers and researchers focusing on sound-based interaction in gaming.
• Middleware and tool developers aiming to create better solutions for accessible audio design.
• Industry professionals seeking to align with accessibility regulations and best practices.

This panel discussion will explore how sound engineers can enhance game accessibility through innovative audio solutions, providing insights into the latest tools, design techniques, and industry best practices.
Speakers
avatar for Tomasz Żernicki

Tomasz Żernicki

co-founder, my3DAudio
Tomasz Zernicki is co-founder and former CEO of Zylia (www.zylia.co), an innovative company that provides tools for 3D audio recording and music production.Additionally, he is a founder of my3DAudio Ventures, whose goal is to scale audio companies that reach the MVP phase and want... Read More →
Friday May 23, 2025 9:00am - 10:00am CEST
Hall F ATM Studio Warsaw, Poland

9:00am CEST

Binaural Audio Reproduction Using Loudspeaker Array Beamforming
Friday May 23, 2025 9:00am - 10:15am CEST
Binaural audio is fundamental to delivering immersive spatial sound, but traditional playback has been limited to headphones. Crosstalk Cancellation (CTC) technology overcomes this limitation by enabling accurate binaural reproduction over loudspeakers, allowing for a more natural listening experience. Using a compact loudspeaker array positioned in front of the listener, CTC systems apply beamforming techniques to direct sound precisely to each ear. Combined with listener tracking, this ensures consistent and accurate binaural playback, even as the listener moves. This workshop will provide an in-depth look at the principles behind CTC technology, the role of loudspeaker array beamforming, and a live demonstration of a listener-tracked CTC soundbar.
Speakers
avatar for Jacob Hollebon

Jacob Hollebon

Principal Research Engineer, Audioscenic
I am a researcher specialising in 3D spatial audio reproduction and beamforming using loudspeaker arrays. In my current role at Audioscenic I am helping commercialize innovate listener-adaptive loudspeaker arrays for 3D audio and multizone reproduction. Previously I developed a new... Read More →
avatar for Marcos Simón

Marcos Simón

CTO, Audioscenic
Friday May 23, 2025 9:00am - 10:15am CEST
C3 ATM Studio Warsaw, Poland

9:00am CEST

Theoretical, Aesthetic, and Musical Review of Microphone Techniques for Immersive Sound Recording
Friday May 23, 2025 9:00am - 10:30am CEST
Immersive audio has become a significant trend in music recording, reproduction, and the audio and entertainment industries. This workshop will explore microphone techniques for immersive sound recording from theoretical, aesthetic, and musical perspectives.

Capturing a music performance and its acoustic features in a specific reverberant field, such as a concert hall, requires specialized microphone techniques for immersive sound. Various microphone techniques have already been proposed for immersive music recording. Achieving a natural timbre, appropriate musical balance, wide frequency range, low distortion, and high signal-to-noise ratio are essential in music recordings for capturing the music performance, including immersive sound recording. The acoustic features of the musical performances can be naturally reproduced by appropriately capturing direct and indirect sounds in the sound field.

The first topic of this workshop will cluster and review microphone techniques based on their fundamental roles. The panelists will also introduce their immersive sound music recording concept, demonstrate their microphone techniques, and provide sound demos.

Immersive audio can expand the adequate listening area if the microphone technique is designed with this goal. This is crucial for popularizing immersive sound reproduction among music lovers. Therefore, the second topic of this workshop will discuss microphone techniques from the perspective of the listening area during reproduction. The panelist will explain his hypothesis that lower correlation values in the vertical direction contribute to the expansion of the listening area.

In immersive sound recording, various microphone techniques have been proposed to reproduce the top layer of the multichannel discrete loudspeaker layout. It is recommended to use directional microphones and position the top and middle layer microphones simultaneously to avoid phase differences that can degrade timbre. However, some reports suggest that separating the top and middle layers can enhance the perception of vertical spaciousness. Experiments conducted by the panelists also suggest that separating these layers and lowering the correlation between them can widen the listening area without altering the central listening position's impression. Comparing microphone types and installation positions in the upper layer is challenging in actual recording situations. Therefore, the panelists will compare listening impressions under various conditions and allow participants to experience these differences using virtual recording techniques (V2MA), which will be discussed as the third topic of this workshop.

Several papers have reviewed microphone techniques, but most have relied on subjective evaluation. The third topic of this workshop will attempt to evaluate microphone techniques from a physical viewpoint. The panel will introduce the Virtual Microphone Array technique (V2MA) to determine how each microphone captures a room's reflection sounds and identify the acoustical features of several microphone arrays used for immersive sound recording. V2MA generates Spatial Room Impulse Responses (SRIR) using a virtual microphone placed in a virtual room with spatial properties of dominant reflections previously sampled in an actual room.

Lectures and demos help us understand the acoustical features and intentions behind microphone techniques, but they are insufficient to grasp their spatial characteristics, especially for immersive sound recording. The panelists will provide 7.0.4ch demos to showcase the spatial features of microphone techniques using V2MA. V2MA generates the acoustic response of a microphone placed virtually in a room, calculated from spatial information of virtual sound sources, such as dominant reflections detected from sound intensities measured in the target room. This workshop will illustrate the spatial characteristics of microphone arrays, allowing us to discuss the types of reflections captured by microphones and discover the differences in spatial features between microphone techniques.

Following each panelist's presentation, a panel discussion will delve into microphone techniques from theoretical, aesthetic, and musical viewpoints. This workshop aims to review issues with microphone techniques for immersive sound and discuss potential solutions to achieve natural spatial reproduction of musical performances for home entertainment.
Speakers
avatar for Toru Kamekawa

Toru Kamekawa

Professor, Tokyo University of the Arts
Toru Kamekawa: After graduating from the Kyushu Institute of Design in 1983, he joined the Japan Broadcasting Corporation (NHK) as a sound engineer. During that period, he gained his experience as a recording engineer, mostly in surround sound programs for HDTV.In 2002, he joined... Read More →
avatar for Masataka Nakahara

Masataka Nakahara

Acoustic Designer / Acoustician, SONA Corp. / ONFUTURE Ltd.
Masataka Nakahra is an acoustician specializing in studio acoustic design and R&D work on room acoustics, as well as an educator.After studying acoustics at the Kyushu Institute of Design, he joined SONA Corporation and began his career as an acoustic designer.In 2005, he received... Read More →
Friday May 23, 2025 9:00am - 10:30am CEST
C4 ATM Studio Warsaw, Poland

9:15am CEST

Generative AI in Audio Education: Process-Centred Teaching for a Product-Centred World
Friday May 23, 2025 9:15am - 9:35am CEST
Artificial intelligence (AI) tools are transforming the way music is being produced. The rate of development is expeditious, and the associated metamorphosis of audio education is abrupt. Higher-level education is largely built around the objectives of knowledge transmission and skills development, evidenced by the emphasis on learning in the cognitive domain in University programmes. But the cohort of skills that music producers will require in five years’ time is unclear, making skills-based curriculum planning challenging. Audio educators require a systematic approach to integrate AI tools in ways that enhance teaching and learning.

This study uses speculative design as the underpinning research methodology. Speculative design employs design to explore and evaluate possible futures, alternative realities, and sociotechnical trends. In this study, the practical tasks in an existing university module are modified by integrating available GAI tools to replace or augment the task design. This tangible artefact is used to critique prevailing assumptions concerning the use of GAI in music production and audio education. The findings suggest that GAI tools will disrupt the existing audio education paradigm. Employing a process-centred approach to teaching and learning may represent a key progression for educators to help navigate these changes.
Speakers
Friday May 23, 2025 9:15am - 9:35am CEST
C1/2 ATM Studio Warsaw, Poland

9:15am CEST

Investigating Individual, Loudness-Dependent Equalization Preferences in Different Driving Sound Conditions
Friday May 23, 2025 9:15am - 9:35am CEST
In automotive audio playback systems, dynamically increasing driving sounds are typically taken into account by applying a generic, i.e., non-individualized, increase in overall level and low-frequency amplification to compensate increased masking. This study investigated the degree of individuality regarding the preferences of noise-dependent level and equalizer settings. A user study with 18 normal-hearing participants was conducted in which individually preferred level-dependent and frequency-dependent amplification parameters were determined using a music-based procedure in quiet and in nine different driving noise conditions. The comparison of self-adjusted parameters suggested that, on average, participants adjusted higher overall levels and more low-frequency amplification in noise than in quiet. However, preferred self-adjusted levels differedmarkedly between participants for the same listening conditions but were quite similar in a re-test session for each participant, indicating that individual preferences were stable and could be reproducibly measured with the employed personalization scheme. Furthermore, the impact of driving noise on individually preferred settings revealed strong interindividual differences, indicating that listeners can differ widely with respect to their individual optimum of how equalizer and level settings should be dynamically adapted to changes in driving conditions.
Speakers
avatar for Jan Rennies

Jan Rennies

Head of Group Personalized Hearing Systems, Fraunhofer Institute for Digital Media Technology IDMT
I am headin´g a group at Fraunhofer IDMT dedicated to developing new solutions for better communication, hearing, and hearing health in various applications together with partners from industry and academia. I am particularly interested in networking and exploring opportunities for... Read More →
Friday May 23, 2025 9:15am - 9:35am CEST
C2 ATM Studio Warsaw, Poland

9:30am CEST

Education & Career Fair
Friday May 23, 2025 9:30am - 11:30am CEST
Friday May 23, 2025 9:30am - 11:30am CEST
Hall F ATM Studio Warsaw, Poland

9:35am CEST

A Collaborative and Reflective Framework for Redesigning Music Technology Degree Programmes
Friday May 23, 2025 9:35am - 9:55am CEST
Cyclical formal reviews are essential to keep Music and Audio Technology degree programmes current. Whilst clear institutional guidance exists on the requisite documentation to be submitted, there is little guidance concerning the process used to gather the information. To address this issue, a 12 step collaborative and reflective framework was developed to review a degree programme in Music Technology.

This framework employs Walker’s ‘Naturalistic’ process model and design thinking principles to create a dynamic, stakeholder-driven review process. The framework begins with reflective analysis by faculty, helping to define program identity, teaching philosophy, and graduate attributes. Existing curricula are evaluated using Boehm et al.’s (2018) tetrad framework of Music Technology encompassing the sub-disciplines of production, technology, art, and science. Insights from industry professionals, learners, and graduates are gathered through semi-structured interviews, surveys, and focus groups to address skill gaps, learner preferences, and emerging trends. A SWOT analysis further refines the scope and limitations of the redesign process, which culminates in iterative stakeholder consultations to finalise the program’s structure, content, and delivery.

This process-centred approach emphasises adaptability, inclusivity, and relevance, thus ensuring the redesigned program is learner-centred and aligned with future professional and educational demands. By combining reflective practice and collaborative engagement, the framework offers a comprehensive, replicable model for educators redesigning degree programmes in the discipline. This case study contributes to the broader discourse on curriculum design in music and audio degree programmes, demonstrating how interdisciplinary and stakeholder-driven approaches can balance administrative requirements with pedagogical innovation.
Speakers
Friday May 23, 2025 9:35am - 9:55am CEST
C1/2 ATM Studio Warsaw, Poland

9:35am CEST

Subjective test of loudspeaker virtualization
Friday May 23, 2025 9:35am - 9:55am CEST
In this contribution we present subjective tests of loudspeaker virtualization, a technique enabling the application of specific target behaviors to the physical loudspeaker system. In this work, loudspeaker virtualization is applied to virtualize a closed box car audio subwoofer to replicate the performance of a larger vented enclosure. The tests are designed to determine if any reduction in sound quality is detected by a panel of listeners when a virtualized loudspeaker is used.
Friday May 23, 2025 9:35am - 9:55am CEST
C2 ATM Studio Warsaw, Poland

9:55am CEST

Acoustic Sovereignties: Resounding Indigenous Knowledge in Sound-Based Research
Friday May 23, 2025 9:55am - 10:15am CEST
Acoustic Sovereignties (2024) is a First Nations, anti-colonial spatial audio exhibition held in Naarm (Melbourne), Australia. Through curatorial and compositional practices, Acoustic Sovereignties confronts traditional soundscape and Western experimental sound disciplines by foregrounding marginalised voices.
As this research will show, the foundations of sound-based practices such as Deep Listening and Soundscape Studies consisted of romanticised notions of Indigenous spirituality, in addition to the intentional disregard for First Nations stewardship and kinship with the land and its acoustic composition. Acoustic Sovereignties aims at reclaiming Indigenous representation throughout sound-based disciplines and arts practices by providing a platform for voices, soundscapes and knowledge to be heard.
Speakers
avatar for Hayden Ryan

Hayden Ryan

Graduate Student, New York University
My name is Hayden Ryan, I am a First Nations Australian sound scholar and artist, and a 2024 New York University Music Technology Masters graduate. I am about to start my PhD at RMIT University, looking at the integration of immersive technologies with Indigenous sonic and spatial... Read More →
Friday May 23, 2025 9:55am - 10:15am CEST
C1/2 ATM Studio Warsaw, Poland

9:55am CEST

Objective measurements for basic sound quality and special audio features in cars
Friday May 23, 2025 9:55am - 10:15am CEST
Car audio systems aim to provide information, entertainment, and acoustic comfort to drivers and passengers in cars. In addition to basic audio functions for broadcasting, playing chimes, warning sound, and music, there are special audio features such as vehicle noise compensation, spatial sound effects, individual sound zone, and active noise control. In this paper, commonly used objective measurement methods for basic sound quality and special features in cars are reviewed and discussed. All objective measurements are proposed to use the 6-unit microphone array specified in the White Paper for In-car Acoustic Measurements released by AES Technical Committee on Automotive Audio in 2023, and the main parameters to be measured are frequency responses and sound pressure levels in the car when the specially designed test signals are played back. General measurement frameworks and procedures for basic sound quality and each feature are presented. The advantages and weakness of using these parameters to characterize the basic sound quality and special features of a car audio system are discussed, and the challenges and future directions are explored.
Speakers
avatar for Xiaojun Qiu

Xiaojun Qiu

Huawei
Dr. Xiaojun Qiu is currently a Chief Scientist in Audio and Acoustics at Huawei. Before he joined Huawei in late 2020, he had been a professor in several universities for nearly 20 years. He is a Fellow of Audio Engineering Society and a Fellow of International Institute of Acoustics... Read More →
Friday May 23, 2025 9:55am - 10:15am CEST
C2 ATM Studio Warsaw, Poland

10:15am CEST

Fast facts on room acoustics
Friday May 23, 2025 10:15am - 11:15am CEST
If you are considering establishing a room for sound, i.e., recording, mixing, editing, listening, or even a room for live music, this is the crash course to attend!
Initially, we’ll walk through the essential considerations for any design of an acoustic space, (almost) no matter the purpose: Appropriate reverberation time, appropriate sound distribution, low background noise, no echoes/flutter echoes, appropriate control of early reflections, (and for stereo/surround/immersive: a degree of room symmetry).
To prevent misunderstandings, we must define the difference between room acoustics and building acoustics. This is a tutorial on room acoustics! Finding the right reverberation time for a project depends on the room's purpose. We’ll look into some relevant standards to find an appropriate target value and pay attention to the importance of the room's frequency balance, especially at low frequencies! We will take the starting point for calculation using Sabine’s equation and discuss the conditions to make it work.
The room's shape, the shape’s effect on room modes, and the distribution of the modes are mentioned (together with the term Schroeder Frequency). The acoustical properties of some conventional building materials and the consequences of choosing one in favor of another for the basic design are discussed. The membrane absorbers (plasterboard, plywood, gypsum board) and their importance in proper room design are presented here. This also involves the definition of absorption coefficients (and how to get them).
From the “raw” room and its properties, we move on to define the acoustic treatment to reach the target value. Again, the treatment often can be cheaper building materials. However, a lot of expensive specialized materials are also available. We’ll try to find a way through the jungle, keeping an eye on the spending. The tools typically are porous absorbers for the smaller rooms. Sometimes, resonance absorbers are used for larger rooms. We don’t want overkill of the high frequencies!
The placement of the sound sources in the room influences the perceived sound. A few basic rules are given. Elements to control the sound field are discussed: Absorption vs. diffusion. Some more uncomplicated principles for DYI diffusers are shown.
During the presentation, various practical solutions are presented. At the end of the tutorial, there will be some time for a minor Q&A.
Speakers
avatar for Eddy B. Brixen

Eddy B. Brixen

consultant, EBB-consult
Eddy B. Brixenreceived his education in electronic engineering from the Danish Broadcasting Corporation, the Copenhagen Engineering College, and the Technical University of Denmark. Major activities include room acoustics, electro-acoustic design, and audio forensics. He is a consultant... Read More →
Friday May 23, 2025 10:15am - 11:15am CEST
Hall F ATM Studio Warsaw, Poland

10:15am CEST

Immersive Music Production - Stereo plus effects is not enough!
Friday May 23, 2025 10:15am - 11:15am CEST
Since we've moved from stereo to surround and 3D/immersive productions, many immersive music mixes still sound very much like larger stereo versions. Part of the reason for this is the record company's demands and the argument, that people don't have properly set up systems at home or only listen with headphones. But that's not the way to experience the real adventure, which is to create new, stunning sound and musical experiences. The workshop will not criticize mixes, but try to open the door to the new dimension of music and discuss the pros and cons that producers have to deal with today.
Speakers
avatar for Tom Ammermann

Tom Ammermann

New Audio Technology
Grammy-nominated music producer, Tom Ammermann, began his journey as a musician and music producer in the 1980s.At the turn of the 21st Century, Tom produced unique surround audio productions for music and film projects as well as pioneering the very first surround mixes for headphones... Read More →
Friday May 23, 2025 10:15am - 11:15am CEST
C4 ATM Studio Warsaw, Poland

10:30am CEST

Use of Headphones in Audio Monitoring
Friday May 23, 2025 10:30am - 11:30am CEST
Extensive studies have been made into achieving generally enjoyable sound colour in headphone listening, but few publications have been written focusing on the demanding requirements of a single audio professional, and what they actually hear.

However, headphones provide fundamentally different listening conditions, compared to our professional, in-room monitoring standards. With headphones, there is even no direct connection between measured frequency response and what a given user hears.

Media professionals from a variety of fields need awareness of such differences, and to take them into account in content production and quality control.

The paper details a recently published method and systematic steps to get to know yourself as a headphone listener. It also summarises new studies of basic listening requirements in headphone monitoring; and it explains why, even if the consumer is listening on headphones, in-room monitoring is generally the better and more relevant common denominator to base production on. The following topics and dimensions are compared across in-room and headphone monitoring: Audio format, listening level, frequency response, auditory envelopment, localisation, speech intelligibility and low frequency sensation.

New, universal headphone monitoring standards are required, before such devices may be used with a reliability and a confidence comparable to in-room monitoring adhering to, for example, ITU-R BS.1116, BS.775 and BS.2051.
Speakers
Friday May 23, 2025 10:30am - 11:30am CEST
C3 ATM Studio Warsaw, Poland

10:40am CEST

Testing Auditory Illusions in Augmented Reality: Plausibility, Transfer-Plausibility and Authenticity
Friday May 23, 2025 10:40am - 11:00am CEST
Experiments testing sound for augmented reality can involve real and virtual sound sources. Paradigms are either based on rating various acoustic attributes or testing whether a virtual sound source is believed to be real (i.e., evokes an auditory illusion). This study compares four experimental designs indicating such illusions. The first is an ABX task suitable for evaluation under the authenticity paradigm. The second is a Yes/No task, as proposed to evaluate plausibility. The third is a three-alternative-forced-choice (3AFC) task using different source signals for real and virtual, proposed to evaluate transfer-plausibility. Finally, a 2AFC task was tested. The renderings compared in the tests encompassed mismatches between real and virtual room acoustics. Results confirm that authenticity is hard to achieve under nonideal conditions, and ceiling effects occur because differences are always detected. Thus, the other paradigms are better suited for evaluating practical augmented reality audio systems. Detection analysis further shows that the 3AFC transfer-plausibility test is more sensitive than the 2AFC task. Moreover, participants are more sensitive to differences between real and virtual sources in the Yes/No task than theory predicts. This contribution aims to aid in selecting experimental paradigms in future experiments regarding perceptual and technical requirements for sound in augmented reality.
Speakers
avatar for Nils Meyer-Kahlen

Nils Meyer-Kahlen

Aalto University
avatar for Sebastia Vicenc Amengual Gari

Sebastia Vicenc Amengual Gari

Sebastia V. Amengual Gari is currently a research scientist at Reality Labs Research (Meta) working on room acoustics, spatial audio, and auditory perception. He received a Diploma Degree in Telecommunications with a major in Sound and Image in 2014 from the Polytechnic University... Read More →
avatar for Sebastian Schlecht

Sebastian Schlecht

Professor of Practice, Aalto University
Sebastian J. Schlecht is Professor of Practice for Sound in Virtual Reality at the Aalto University, Finland. This position is shared between the Aalto Media Lab and the Aalto Acoustics Lab. His research interests include spatial audio processing with an emphasis on artificial reverberation, synthesis, reproduction, and 6-degrees-of-freedom virtual and mixed reality applications. In particular, his research efforts have been directed towards the intersection of app... Read More →
TL

Tapio Lokki

Department of Signal Processing and Acoustics, Aalto University
Friday May 23, 2025 10:40am - 11:00am CEST
C1 ATM Studio Warsaw, Poland

10:40am CEST

Acoustic Objects: bridging immersive audio creation and distribution systems
Friday May 23, 2025 10:40am - 11:00am CEST
In recent years, professional and consumer audio and music technology has advanced in several areas, including sensory immersion, electronic transmission, content formats, and creation tools. The production and consumption of immersive media experiences increasingly rely on a global network of interconnected frameworks. These experiences, once confined to separate content markets like music, movies, video games, and virtual reality, are now becoming interoperable, ubiquitous, and adaptable to individual preferences, conditions, and languages. This article explores this evolution, focusing on flexible immersive audio creation and reproduction. We examine the development of object-based immersive audio technology and its role in unifying broadcast content with embodied experiences. We introduce the concept of Acoustic Objects, proposing a universal spatial audio scene representation model for creating and distributing versatile, navigable sound in music, multimedia, and virtual or extended reality applications.
Speakers
avatar for Jean-Marc Jot

Jean-Marc Jot

Founder and Principal, Virtuel Works LLC
Spatial audio and music technology expert and innovator. Virtuel Works provides audio technology strategy, IP creation and licensing services to help accelerate the development of audio and music spatial computing technology and interoperability solutions.
avatar for Thibaut Carpentier

Thibaut Carpentier

STMS Lab - IRCAM, SU, CNRS, Ministère de la Culture
Thibaut Carpentier studied acoustics at the École centrale and signal processing at Télécom ParisTech, before joining the CNRS as a research engineer. Since 2009, he has been a member of the Acoustic and Cognitive Spaces team in the STMS Lab (Sciences and Technologies of Music... Read More →
Friday May 23, 2025 10:40am - 11:00am CEST
C2 ATM Studio Warsaw, Poland

11:00am CEST

Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF
Friday May 23, 2025 11:00am - 11:20am CEST
Immersive audio mix presentations involve transmitting and rendering several audio elements simultaneously. This enables next-generation applications, such as personalized playback. Using immersive loudspeaker and headphone MUSHRA tests, we investigate rate vs. quality for a typical mix presentation use case of a foreground stereo element, plus a background Ambisonics scene. For coding, we use Immersive Audio Model and Formats, a recently proposed system for Next-Generation Audio. Excellent quality is achieved at 384 kbit/s, even with reasonable amount of personalization. We also propose a framework for content-aware analysis that can significantly reduce the bitrate even when using underlying legacy audio coding instances.
Speakers
CT

Carlos Tejeda Ocampo

Samsung Research Tijuana
avatar for Jan Skoglund

Jan Skoglund

Google
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
Friday May 23, 2025 11:00am - 11:20am CEST
C1 ATM Studio Warsaw, Poland

11:00am CEST

Immersive Music Production Workflows: An Ethnographic Study of Current Practices
Friday May 23, 2025 11:00am - 11:20pm CEST
This study presents an ethnographic analysis of current immersive music production workflows, examining industry trends, tools, and methodologies. Through interviews and participant observations with professionals across various sectors, the research identifies common patterns, effective strategies, and persistent obstacles in immersive audio production. Key findings highlight the ongoing struggle for standardized workflows, the financial and technological barriers faced by independent artists, and the critical role of collaboration between engineers and creatives. Despite the growing adoption of immersive formats, workflows still follow stereo conventions, treating spatialization as an afterthought and complicating the translation of mixes across playback systems. Additionally, the study explores the evolving influence of object-based and bed-based mixing techniques, monitoring inconsistencies across playback systems, and the need for improved accessibility to immersive production education. By synthesizing qualitative insights, this paper contributes to the broader discourse on immersive music production, offering recommendations for future research and industry-wide best practices to ensure the sustainable integration of spatial audio technologies.
Speakers
avatar for Marcela Rada

Marcela Rada

Audio Engineer
Marcela is a talented and accomplished audio engineer that has experience both in the studio and in the classroom teaching university level students the skills of becoming professional audio engineers and music producers. She has worked across music genres recording, editing, mixing... Read More →
RM

Russell Mason

Institute of Sound Recording, University of Surrey
avatar for Enzo De Sena

Enzo De Sena

Senior Lecturer, University of Surrey
Enzo De Sena is a Senior Lecturer at the Institute of Sound Recording at the University of Surrey. He received the M.Sc. degree (cum laude) in Telecommunication engineering from the Università degli Studi di Napoli “Federico II,” Italy, in 2009 and the PhD degree in Electronic Engineering from King’s College London, UK, in 2013. Between 2013 and 2016 he was a postdoctoral researcher at KU Leuven... Read More →
Friday May 23, 2025 11:00am - 11:20pm CEST
C2 ATM Studio Warsaw, Poland

11:20am CEST

Evaluation of auditory distance perception in reflective sound field by static and dynamic virtual auditory display
Friday May 23, 2025 11:20am - 11:40am CEST
A psychoacoustic experiment is conducted to evaluate and compared the auditory distance perception in reflected sound field by using static and dynamic VAD. The binaural signals creased by a point source at different distances in a rectangular room are simulated. The contribution of direct sound to binaural signals is simulated by near-field head-related transfer function filters and a gain factor to account for the propagation attenuation of spherical surface wave. The contribution of early reflections up to the second order and later reverberation are respectively simulated by the image source method and Schroeder reverberation algorithm. The results of psychoacoustic experiment indicates that there are still significant differences between the perceived distances created by static VAD and these created by dynamic VAD in the simulated reflected condition, although the differences are not so large as those in the simulated free-field case. The results of dynamic VAD are more consistent with these of real sound source. Therefore, simulating reflections reduces the in-head-localization and thus improves the control of perceived distance in headphone presentation, but static VAD is still less effective in creating different distance perception. Dynamic VAD is still needed in the distance perception experiment for hearing researches even if simulated reflections are included. In practical applications, dynamic VAD is advocated for recreating virtual source at different distance.
Friday May 23, 2025 11:20am - 11:40am CEST
C1 ATM Studio Warsaw, Poland

11:20am CEST

Perceived Quality of Binaural Rendering From Baffled Microphone Arrays Evaluated Without an Explicit Reference, Part 2
Friday May 23, 2025 11:20am - 11:40am CEST
We present a follow-up study on the perceptual evaluation of the binaural rendering quality of signals from several types of baffled microphone arrays. The tested conditions comprise spherical and equatorial microphone arrays, with spherical and non-spherical baffles, employing non-parametric (signal-independent) rendering and magnitude equalization in the spherical harmonic domain. Following the multi-stimulus category rating paradigm, the arrays are presented in comparison to each other and multiple anchor conditions while omitting an explicit reference stimulus. Our results confirm previous findings that, without an explicit reference, subjects rate the quality of stimuli with large variances, as demonstrated for various repeated low/high-accuracy hidden anchor conditions. The average listener ranked the quality of equatorial arrays above spherical, followed by non-spherical microphone arrays, at the same spherical harmonics order. We discuss statistical trends for individual rendering configurations but find no overall effects in the comparison across different source signals, source incidence directions, or room environments.
Speakers
HH

Hannes Helmholz

PhD Student, Chalmers University of Technology
Friday May 23, 2025 11:20am - 11:40am CEST
C2 ATM Studio Warsaw, Poland

11:40am CEST

Subjective evaluation of immersive microphone arrays for drums
Friday May 23, 2025 11:40am - 12:00pm CEST
Through a practice-oriented study, various coincident, near-coincident, and non-coincident immersive microphone arrays were compared during drum recordings for different contemporary popular music genres. In a preliminary study, the OCT-3D, PCMA-3D, 2L-Cube, Hamasaki Square, IRT Cross, Ambisonics A-Format, and native B-Format were informally compared, revealing that the differences between non-coincident systems were much smaller than the differences between coincident and non-coincident systems. This led to a reduction in microphone systems for the final drum recordings. Four microphone techniques were selected: OCT-3D, native B-Format, Ambisonics A-Format, and IRT Cross. These were compared within the context of two different songs – a calm pop track and an energetic rock song – where the drums were respectively recorded in a dry drum booth and a large studio hall. Through a listening test with a small sample group, it was determined which microphone technique was best suited for each song. Participants were also asked to identify the general favorite, without musical context, as well as how the spatiality, timbre, and height were perceived. It was concluded that the choice of immersive microphone technique depends on the musical context. Conclusions from more objective studies focus primarily on accurate localization, with non-coincident systems consistently performing the best. However, these studies do not take into account the musical context, where accurate localization does not always take precedence. Furthermore, it was noted that height perception in music is not solely created by speakers in the height range. The comparative drum recordings are published through https://www.immersive.pxl.be/immersive- microphone-techniques-for-drums/.
Speakers
avatar for Arthur Moelants

Arthur Moelants

Researcher, PXL Music Research
avatar for Steven Maes

Steven Maes

Founder of Motormusic Studios, Researcher & Lecturer at PXL Music, PXL Music
Friday May 23, 2025 11:40am - 12:00pm CEST
C1 ATM Studio Warsaw, Poland

11:40am CEST

Spherical harmonic beamforming based Ambisonics encoding and upscaling method for smartphone microphone array
Friday May 23, 2025 11:40am - 12:00pm CEST
With the rapid development of virtual reality (VR) and augmented reality (AR), spatial audio recording and reproduction have gained increasing research interest. Higher Order Ambisonics (HOA) stands out for its adaptability to various playback devices and its ability to integrate head orientation. However, current HOA recordings often rely on bulky spherical microphone arrays (SMA), and portable devices like smartphones are limited by array configuration and number of microphones. We propose a method for HOA encoding using a smartphone microphone array (SPMA). By designing beamformers for each order of spherical harmonic functions based on the array manifold, the method enables HOA encoding and up-scaling. Validation on a real SPMA and its simulated free-field counterpart in noisy and reverberant conditions showed that the method successfully encodes and up-scales HOA up to the fourth order with just four irregularly arranged microphones.
Friday May 23, 2025 11:40am - 12:00pm CEST
C2 ATM Studio Warsaw, Poland

12:00pm CEST

Student Recording Competition 2
Friday May 23, 2025 12:00pm - 1:00pm CEST
Friday May 23, 2025 12:00pm - 1:00pm CEST
C3 ATM Studio Warsaw, Poland

12:00pm CEST

The Future of Immersive Audio: Expanding Beyond Music and Film
Friday May 23, 2025 12:00pm - 1:00pm CEST
The evolution of 3D audio has significantly influenced the music and film industries, yet its full potential remains untapped. This panel will explore how immersive audio technologies, including Ambisonics, Dolby Atmos, and volumetric sound, shape new frontiers beyond traditional applications. We will focus on three key areas: accessibility in video games, the integration of 3D audio in gaming experiences, and its growing role in the automotive industry. Our panelists will discuss the state of the market, technological limitations, and emerging opportunities where spatial audio enhances user experience, safety, and engagement. This discussion aims to inspire innovation and collaboration among researchers, developers, and industry professionals.
Speakers
avatar for Tomasz Żernicki

Tomasz Żernicki

co-founder, my3DAudio
Tomasz Zernicki is co-founder and former CEO of Zylia (www.zylia.co), an innovative company that provides tools for 3D audio recording and music production.Additionally, he is a founder of my3DAudio Ventures, whose goal is to scale audio companies that reach the MVP phase and want... Read More →
Friday May 23, 2025 12:00pm - 1:00pm CEST
C4 ATM Studio Warsaw, Poland

12:00pm CEST

Key Technology Briefings 2
Friday May 23, 2025 12:00pm - 1:15pm CEST
Friday May 23, 2025 12:00pm - 1:15pm CEST
C1 ATM Studio Warsaw, Poland

12:00pm CEST

Acoustic analysis on ancient stadia: from Circus Maximum of Rome to Hippodrome of Constantinople
Friday May 23, 2025 12:00pm - 1:30pm CEST
#N/A
Speakers
AB

Antonella Bevilacqua

University of Parma
Friday May 23, 2025 12:00pm - 1:30pm CEST
Hall F ATM Studio Warsaw, Poland

12:00pm CEST

Adaptive Room Acoustics Optimisation Using Virtual Microphone Techniques
Friday May 23, 2025 12:00pm - 1:30pm CEST
Room acoustics optimisation in live sound environments using signal processing techniques has captivated the minds of audio enthusiasts and researchers alike for over half a century. From analogue filters in the 1950s, to modern research efforts such as room impulse response equalisation and adaptive sound field control, this subject has exploded to life. Controlling the sound field in a static acoustic space is complex due to the high number of system variables, such as reflections, speaker crosstalk, equipment-induced coloration, room modes, reverberation, diffraction and listener positioning. These challenges are further amplified by dynamic variables such as audience presence, environmental conditions and room occupancy changes, which continuously and unpredictably reshape the sound field.
A primary objective of live sound reinforcement is to deliver uniform sound quality across the audience area. This is most critical at audience ear level, where tonal balance, clarity, and spatial imaging are most affected by variations in the sound field. While placing microphones at audience ear level positions could enable real-time monitoring, large-scale deployment is impractical due to audience interference.
This research will explore the feasibility of an adaptive virtual microphone-based approach to room acoustics optimisation. By strategically placing microphone arrays and leveraging virtual microphone technology, the system estimates the sound field dynamically at audience ear level without requiring physical microphones. By continuously repositioning focal points across listening zones, a small number of arrays could effectively monitor large audience areas. If accurate estimations can be achieved, real-time sound field control becomes more manageable and effective.
Friday May 23, 2025 12:00pm - 1:30pm CEST
Hall F ATM Studio Warsaw, Poland

12:00pm CEST

Analysis of the Sound Pressure Level Distribution in the Low-Frequency Range Below the First Modal Frequency in Small Room Acoustics
Friday May 23, 2025 12:00pm - 1:30pm CEST
The occurrence of eigenmodes is one of the fundamental phenomena in the acoustics of small rooms. The modes formation results in an uneven distribution of the sound pressure level in the room. To determine the resonance frequencies and their distributions, numerical methods, analytical methods or experimental studies are used. For the purpose of this paper, an experimental study was carried out in a small room. The study analysed the results of measuring the sound pressure level distributions in the room, with a special focus on the frequency range 20 Hz - 32 Hz, below the first modal frequency in the room. The measurement were conducted in the rectangular grid 9x9 microphones, which resulted in 0.5 m microphones grid resolution. The influence of evanescent modes on the total sound field was investigated. The research takes into account several sound source locations. On the basis of the acoustic measurement carried out, frequency response curves were also plotted. This paper presents a few methods for analysing these curves based on standard deviation, the linear least squares method, coefficient of determination R^2 and root mean squared error (RMSE). The results obtained made it possible to determine the best position of the acoustic source in the room under study. The effect of evanescent modes on the total sound field was also observed.
Friday May 23, 2025 12:00pm - 1:30pm CEST
Hall F ATM Studio Warsaw, Poland

12:00pm CEST

Diffuse Signal Processing (DiSP) as a Method of Decorelating a Stereo Mix to Increase Mono Compatibility
Friday May 23, 2025 12:00pm - 1:30pm CEST
Mono compatibility is a fundamental challenge in audio production, ensuring that stereo mixes retain clarity, balance, and spectral integrity when summed to mono. Traditional stereo widening techniques often introduce phase shifts, comb filtering, and excessive decorrelation, causing perceptual loss of critical mix elements in mono playback. Diffuse Signal Processing (DiSP) is introduced as a convolution-based method that improves mono compatibility while maintaining stereo width.

This study investigates the application of DiSP to the left and right channels of a stereo mix, leveraging MATLAB-synthesized TDI responses to introduce spectrally balanced, non-destructive acoustic energy diffusion. TDI convolution is then applied to both the left and right channels of the final stereo mix.

A dataset of stereo mixes from four genres (electronic, heavy metal, orchestral, and pop/rock) was analyzed. The study evaluated phase correlation, mono-summed frequency response deviation and amount of comb filtering to quantify improvements in mono summation. Spectral plots and wavelet transforms provided objective analysis. Results demonstrated that DiSP reduced phase cancellation, significantly decreased comb filtering artifacts, and improved spectral coherence in mono playback while preserving stereo width within the original mix. Applying this process to the final left and right channels allows an engineer to mix freely without the concern of the mono mix’s compatibility.

DiSP’s convolution-based approach offers a scalable, adaptive solution for modern mixing and mastering workflows, overcoming the limitations of traditional stereo processing. Future research includes machine learning-driven adaptive DiSP, frequency-dependent processing enhancements, and expansion to spatial audio formats (5.1, 7.1, Dolby Atmos) to optimize mono downmixing. The findings confirm DiSP as a robust and perceptually transparent method for improving mono compatibility without compromising stereo imaging.
Speakers
TS

Tommy Spurgeon

Physics Student & Undergraduate Researcher, University of South Carolina
Friday May 23, 2025 12:00pm - 1:30pm CEST
Hall F ATM Studio Warsaw, Poland

12:00pm CEST

Instantaneous Low-frequency Energetic Analysis for Detection of Standing Waves
Friday May 23, 2025 12:00pm - 1:30pm CEST
Standing waves are a phenomenon ever-present in the reproduction of low frequencies and have a direct impact on the auditory perception of this frequency region.
This study addresses the challenges posed by standing waves which are difficult to measure accurately using conventional pressure microphones, due to their spatial and temporal characteristics. To combat these issues, a state-of-the-art sound pressure velocity probe specifically designed for measurement of intensity in the low-frequency spectrum is developed. Using this probe, the research includes the development of new energy estimation parameters to better quantify the characteristics of sound fields influenced by standing waves. Additionally, a novel "standing-wave-ness" parameter is proposed, based on two diffuseness quantities dealing with the proportion of locally confined energy and the temporal variation of the intensity vectors. The performance of the new method and probe is evaluated through both simulated and real-world measurement data. Simulations provide a controlled environment to assess the method's accuracy across a variety of scenarios, including both standing wave and non-standing wave conditions. These initial simulations are followed by validation through measurement data obtained from an anechoic chamber, ensuring that the method's capabilities are tested in highly controlled, close-to-real-world settings. Preliminary results from this dual approach show promising potential for the new method to quantify the presence of standing waves, adding a new dimension in the visualisation and understanding of low-frequency phenomena.
Speakers
avatar for Madalina Nastasa

Madalina Nastasa

Doctoral Researcher, Aalto University
Doctoral researcher at the Acoustics Lab of Aalto University passionate about everything audio. My research focuses on the human perception of the very low frequency spectrum, and so does my day to day life. When I am not in the Acoustics lab, I organise electronic music events where... Read More →
avatar for Aki Mäkivirta

Aki Mäkivirta

R&D Director, Genelec Oy
Aki Mäkivirta is R&D Director at Genelec, Iisalmi, Finland, and has been with Genelec since 1995. He received his Master of Science, Licentiate of Science, and Doctor of Science in Technology degrees from Tampere University of Technology, in 1985, 1989, and 1992, respectively.  Aki... Read More →
Friday May 23, 2025 12:00pm - 1:30pm CEST
Hall F ATM Studio Warsaw, Poland

12:15pm CEST

The Future Of Spatial Audio For Consumers
Friday May 23, 2025 12:15pm - 1:15pm CEST
As spatial audio shifts from a premium feature to a mainstream expectation, significant challenges remain in delivering a uniform experience across devices, formats, and playback systems. This panel brings together industry and academic experts to explore the key technologies driving the future of immersive audio for consumers. We’ll discuss the core technological advancements, software, hardware, and ecosystem innovations necessary to enable more seamless and consistent spatial audio experiences. Additionally, we will examine the challenges of delivering perceptually accurate spatial audio across diverse playback environments and identify the most critical areas of focus for industry and academia to accelerate broader consumer adoption of spatial audio.
Speakers
avatar for Jacob Hollebon

Jacob Hollebon

Principal Research Engineer, Audioscenic
I am a researcher specialising in 3D spatial audio reproduction and beamforming using loudspeaker arrays. In my current role at Audioscenic I am helping commercialize innovate listener-adaptive loudspeaker arrays for 3D audio and multizone reproduction. Previously I developed a new... Read More →
avatar for Marcos Simón

Marcos Simón

CTO, Audioscenic
avatar for Jan Skoglund

Jan Skoglund

Google
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
Friday May 23, 2025 12:15pm - 1:15pm CEST
C2 ATM Studio Warsaw, Poland

1:00pm CEST

Student Recording Competition 3
Friday May 23, 2025 1:00pm - 2:00pm CEST
Friday May 23, 2025 1:00pm - 2:00pm CEST
C3 ATM Studio Warsaw, Poland

1:15pm CEST

Immersive Listening
Friday May 23, 2025 1:15pm - 3:00pm CEST
Friday May 23, 2025 1:15pm - 3:00pm CEST
C4 ATM Studio Warsaw, Poland

1:30pm CEST

Discrimination of vowel-like timbre quality: A case of categorical perception?
Friday May 23, 2025 1:30pm - 1:50pm CEST
This study investigated whether categorical perception—a phenomenon observed in speech perception—extends to the discrimination of vowel-like timbre qualities. Categorical perception occurs when continuous acoustic variations are perceived as distinct categories, leading to better discrimination near category boundaries than within a category. To test this, discrimination thresholds for the center frequency of a one-third-octave band formant introduced into the spectrum of a pink noise burst were measured in five subjects using an adaptive psychophysical procedure. Thresholds were assessed at distinctive formant frequencies of selected Polish vowels and at boundaries between adjacent vowel categories along the formant-frequency continuum. Results showed no reduction in discrimination thresholds at category boundaries, suggesting an absence of categorical perception for vowel-like timbre. One possible explanation for this finding lies in the listening mode—a concept from ecological auditory research—describing cognitive strategies in auditory tasks. The design of both the stimuli and the experimental procedure likely encouraged an acousmatic listening mode, which focuses solely on the sensory characteristics of sound, without reference to its source or meaning. This may have suppressed cues typically used in the categorical perception of speech sounds, which are associated with the communication listening mode. These findings highlight the importance of considering listening mode in future research on categorical perception of timbre and suggest that vowel-like timbre discrimination may involve perceptual mechanisms distinct from those used in speech sound discrimination.
Friday May 23, 2025 1:30pm - 1:50pm CEST
C1 ATM Studio Warsaw, Poland

1:30pm CEST

On the effect of photogrammetric reconstruction and pinna deformation methods on individual head-related transfer functions
Friday May 23, 2025 1:30pm - 1:50pm CEST
Individual head-related transfer functions (HRTFs) are instrumental in rendering plausible spatial audio playback over headphones as well as in understanding auditory perception. Nowadays, the numerical calculation of individual HRTFs is achievable even without high-performance computers. However, the main obstacle is the acquisition of a mesh of the pinnae with a submillimeter accuracy. One approach to this problem is the photogrammetric reconstruction (PR), which estimates a 3D shape from 2D input, e.g., photos. Albeit easy to use, this approach comes with a trade-off in the resulting mesh quality, which subsequently has a substantial impact on the HRTF's quality. In this study, we investigated the effect of PR on HRTF quality as compared to HRTFs calculated from a reference mesh acquired with a high-quality structured-light scanner. Additionally, we applied two pinna deformation methods, which registered a non-individual high-quality pinna to the individual low-quality PR pinna by means of geometric distances. We investigated the potential of these methods to improve the quality of the PR-based pinna meshes. Our evaluation involved the geometrical, acoustical, and psychoacoustical domains including a sound-localization experiment with 9 participants. Our results show that neither PR nor PR-improvement methods were able to provide individual HRTFs of sufficient quality, indicating that without extensive pre- or post-processing, PR provides too little individual detail in the HRTF-relevant pinna regions.
Speakers
avatar for Katharina Pollack

Katharina Pollack

PhD student in spatial audio, Acoustics Research Institute Vienna & Imperial College London
Katharina Pollack studied electrical engineering audio engineering in Graz, both at the Technical University and the University of Music and Performing Arts in Graz and is doing her PhD at the Acoustics Research Institute in Vienna in the field of spatial hearing. Her main research... Read More →
avatar for Piotr Majdak

Piotr Majdak

Austrian Academy of Sciences
Friday May 23, 2025 1:30pm - 1:50pm CEST
C2 ATM Studio Warsaw, Poland

1:45pm CEST

A Testbed for Detecting DeepFake Audio
Friday May 23, 2025 1:45pm - 3:45pm CEST
The rapid advancement of generative artificial intelligence has created highly realistic DeepFake multimedia content, posing significant challenges for digital security and authenticity verification. This paper presents the development of a comprehensive testbed designed to detect counterfeit audio content generated by DeepFake techniques. The proposed framework integrates forensic spectral analysis, numerical and statistical modeling, and machine learning-based detection to assess the authenticity of multimedia samples. Our study evaluates various detection methodologies, including spectrogram comparison, Euclidean distance-based analysis, pitch modulation assessment, and spectral flatness deviations. The results demonstrate that cloned and synthetic voices exhibit distinctive acoustic anomalies, with forensic markers such as pitch mean absolute error and power spectral density variations serving as effective indicators of manipulation. By systematically analyzing human, cloned, and synthesized voices, this research provides a foundation for advancing DeepFake detection strategies. The proposed testbed offers a scalable and adaptable solution for forensic audio verification, contributing to the broader effort of safeguarding multimedia integrity in digital environments.
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland

1:45pm CEST

An audio quality metrics toolbox for media assets management, content exchange, and dataset alignment
Friday May 23, 2025 1:45pm - 3:45pm CEST
Content exchange and collaboration serve as catalysts for repository creation that supports creative industries and fuels model development in machine learning and AI. Despite numerous repositories, challenges persist in discoverability, rights preservation, and efficient reuse of audiovisual assets. To address these issues, the SCENE (Searchable multi-dimensional Data Lakes supporting Cognitive Film Production & Distribution for the Promotion of the European Cultural Heritage) project introduces an automated audio quality assessment toolkit integrated within its Media Assets Management (MAM) platform. This toolkit comprises a suite of advanced metrics, such as artifact detection, bandwidth estimation, compression history analysis, noise profiling, speech intelligibility, environmental sound recognition, and reverberation characterization. The metrics are extracted using dedicated Flask-based web services that interface with a data lake architecture. By streamlining the inspection of large-scale audio repositories, the proposed solution benefits both high-end film productions and smaller-scale collaborations. The pilot phase of the toolkit will involve professional filmmakers who will provide feedback to refine post-production workflows. This paper presents the motivation, design, and implementation details of the toolkit, highlighting its potential to assess content quality management and contribute to more efficient content exchange in the creative industries.
Speakers
avatar for Nikolaos Vryzas

Nikolaos Vryzas

Aristotle University Thessaloniki
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
IT

Iordanis Thoidis

Aristotle University of Thessaloniki
LV

Lazaros Vrysis

Aristotle University of Thessaloniki
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland

1:45pm CEST

Application for Binaural Audio Plays: Development of Auditory Perception and Spatial Orientation
Friday May 23, 2025 1:45pm - 3:45pm CEST
When navigating the environment, we primarily rely on sight. However, in its absence, individuals must develop precise spatial awareness using other senses. A blind person can recognize their immediate surroundings through touch, but assessing larger spaces requires auditory perception.
This project presents a method for auditory training in children with visual disabilities through structured audio plays designed to teach spatial pronouns and enhance spatial orientation via auditory stimuli. The format and structure of these audio plays allow for both guided learning with a mentor and independent exploration. Binaural recordings serve as the core component of the training exercises. The developed audio plays and their analyses are available on the YouTube platform in the form of videos and interactive exercises.
The next step of this project involves developing an application that enables students to create individual accounts and track their progress. Responses collected during exercises will help assess the impact of the audio plays on students, facilitating improvements and modifications to the training materials.
Additionally, linking vision-related questions with responses to auditory exercises will, over time, provide insights into the correlation between these senses. The application can serve multiple purposes: collecting research data, offering spatial recognition and auditory perception training, and creating a comprehensive, structured environment for auditory skill development.
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland

1:45pm CEST

Exploring the Process of Interconnected Procedurally Generated Visual and Audial Content
Friday May 23, 2025 1:45pm - 3:45pm CEST
This paper investigates the innovative synthesis of procedurally generated visual and auditory content through the use of Artificial Intelligence (AI) Tools, specifically focusing on Generative Pre-Trained Transformer (GPT) networks.
This research explores the process of procedurally generating an audiovisual representations of semantic context by generating images, artificially providing motion and generating corresponding multilayered sound. The process enables the generation of stopped-motion audiovisual representations of concepts.
This approach not only highlights the capacity for Generative AI to produce cohesive and semantically rich audiovisual media but also delves into the interconnections between visual art, music, sonification, and computational creativity. By examining the synergy between generated imagery and corresponding soundscapes, this research paper aims to uncover new insights into the aesthetic and technical implications of the use of AI in art.
This research embodies a direct application of AI technology across multiple disciplines creating intermodal media. Research findings propose a novel framework for understanding and advancing the use of AI in the creative processes, suggesting potential pathways for future interdisciplinary research and artistic expression.
Through this work, this study contributes to the broader discourse on the role of AI in enhancing creative practices, offering perspectives on how various modes of semantic representation can be interleaved using state-of-the-art technology.
Speakers
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland

1:45pm CEST

G.A.D.A.: Guitar Audio Dataset for AI - An Open-Source Multi-Class Guitar Corpus
Friday May 23, 2025 1:45pm - 3:45pm CEST
We present G.A.D.A. (Guitar Audio Dataset for AI), a novel open-source dataset designed for advancing research in guitar audio analysis, signal processing, and machine learning applications. This comprehensive corpus comprises recordings from three main guitar categories: electric, acoustic, and bass guitars, featuring multiple instruments within each category to ensure dataset diversity and robustness.

The recording methodology employs two distinct approaches based on instrument type. Electric and bass guitars were recorded using direct recording techniques via DI boxes, providing clean, unprocessed signals ideal for further digital processing and manipulation. For acoustic guitars, where direct recording was not feasible, we utilized multiple microphone configurations at various positions to capture the complete acoustic properties of the instruments. Both recording approaches prioritize signal quality while maintaining maximum flexibility for subsequent processing and analysis.

The dataset includes standardized recordings of major and minor chords played in multiple positions and voicings across all instruments. Each recording is accompanied by detailed metadata, including instrument specifications, recording equipment details, microphone configurations (for acoustic guitars), and chord information. The clean signals from electric instruments enable various post-processing applications, including virtual amplifier modeling, effects processing, impulse response convolution, and room acoustics simulation.

To evaluate G.A.D.A.'s effectiveness in machine learning applications, we propose a comprehensive testing framework using established algorithms including k-Nearest Neighbors, Support Vector Machines, Convolutional Neural Networks, and Feed-Forward Neural Networks. These experiments will focus on instrument classification tasks using both traditional audio features and deep learning approaches.

G.A.D.A. will be freely available for academic and research purposes, complete with documentation, preprocessing scripts, example code, and usage guidelines. This resource aims to facilitate research in musical instrument classification, audio signal processing, deep learning applications in music technology, computer-aided music education, and automated music transcription systems.

The combination of standardized recording methodologies, comprehensive metadata, and the inclusion of both direct-recorded and multi-microphone captured audio makes G.A.D.A. a valuable resource for comparative studies and reproducible research in music information retrieval and audio processing.
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland

1:50pm CEST

Speech intelligibility in noise: A comparative study of musicians, audio-engineers, and non-musicians
Friday May 23, 2025 1:50pm - 2:10pm CEST
Published studies indicate that musicians outperform non-musicians in a variety of non-musical auditory tasks, a phenomenon known as the “musicians’ hearing advantage effect.” One widely reported benefit is enhanced speech-in-noise (SIN) recognition. It was observed that musicians’ speech-in-noise (SIN) recognition thresholds (SRTs) are lower than those of non-musicians, though findings—mainly from English-language studies—are mixed; some confirm these advantage, while others do not. This study extends SRT measurements to Polish, a language with distinct phonetic characteristics. Participants completed a Polish speech intelligibility test, reconstructing sentences masked by multitalker babble noise by selecting words from a list displayed on a computer screen. Speech levels remained constant while masking noise was adjusted adaptively: increasing after each correct response and decreasing after each error. Three groups were tested: musicians, musically trained audio engineers, and non-musicians. Results showed that musicians and audio engineers had SRTs 2 and 2.7 dB lower than non-musicians, respectively. Although audio engineers exhibited slightly lower SRTs than musicians, the difference was minimal, with statistical significance just above the conventional 5% threshold. Thus, under these conditions, no clear advantage of audio engineers over musicians in SIN performance was observed.
Friday May 23, 2025 1:50pm - 2:10pm CEST
C1 ATM Studio Warsaw, Poland

1:50pm CEST

Mesh2PPM - Automatic Parametrization of the BezierPPM: Entire Pinna
Friday May 23, 2025 1:50pm - 2:10pm CEST
An individual human pinna geometry can be used to achieve plausible personalized audio reproduction. However, an accurate acquisition of the pinna geometry typically requires the use of specialized equipment and often involves time-consuming post-processing to remove potential artifacts. To obtain an artifact-free but individualized mesh, a parametric pinna model based on cubic Bézier curves (BezierPPM) can be used to represent an individual pinna. However, the parameters need to be manually tuned to the acquired listener’s geometry. For increased scalability, we propose Mesh2PPM, a framework for an automatic estimation of BezierPPM parameters from an individual pinna. Mesh2PPM relies on a deep neural network (DNN) that was trained on a dataset of synthetic multi-view images rendered from BezierPPM instances. For the evaluation, unseen BezierPPM instances were presented to Mesh2PPM which inferred the BezierPPM parameters. We subsequently assessed the geometric errors between the meshes obtained from the BezierPPM parametrized with the inferred parameters and the actual pinna meshes. We investigated the effects of the camera-grid type, jittered camera positions, and additional depth information in images on the estimation quality. While depth information had no effect, the camera-grid type and the jittered camera positions both had effects. A camera grid of 3×3 provided the best estimation quality, yielding Pompeiu-Hausdorff distances of 2.05 ± 0.4 mm and 1.4 ± 0.3 mm with and without jittered camera
positions, respectively, and root-mean-square (RMS) distances of 0.92 ± 0.12 mm and 0.52 ± 0.07 mm. These results motivate further improvements of the proposed framework to be ultimately applicable for an automatic estimation of pinna geometries obtained from actual listeners.
Speakers
Friday May 23, 2025 1:50pm - 2:10pm CEST
C2 ATM Studio Warsaw, Poland

2:10pm CEST

Exploring stimulus spacing bias in MUSHRA listening tests using labeled and unlabeled graphic scales
Friday May 23, 2025 2:10pm - 2:30pm CEST
The multi-stimulus test with hidden reference and anchor (MUSHRA) is a prevalent method for the subjective audio quality evaluation. Despite its popularity, the technique is not immune to biases. Empirical evidence indicates that the presence of labels (quality descriptors) equidistantly distributed along the rating scale may be the cause of its non-linear warping; however, other factors could evoke even stronger non-linear effects. This study aims to investigate the hypothesis that stimulus spacing bias may induce a greater magnitude of non-linear warping of the quality scale compared to that caused by the presence of labels. To this end, a group of more than 120 naïve listeners participated in MUSHRA-compliant listening tests using labeled and unlabeled graphic scales. The audio excerpts, representing two highly skewed distributions of quality levels, were reproduced over headphones in an acoustically treated room. The findings of this study verify the postulated hypothesis and shed new light on the mechanisms biasing results of the MUSHRA-conformant listening tests.
Friday May 23, 2025 2:10pm - 2:30pm CEST
C1 ATM Studio Warsaw, Poland

2:10pm CEST

Towards a Headphone Target Curve for Spatial Audio
Friday May 23, 2025 2:10pm - 2:30pm CEST
In order to reproduce audio over headphones as in-
tended, it is essential to have well-defined and con-
sistent references of how headphones should sound.
With the aim of stereo reproduction in mind, the field
has established a de-facto reference target curve called
the Harman Target Curve to which headphone transfer
functions are commonly compared. This contribution
questions if the same target curve is suitable when used
for the reproduction of spatial audio. First, the ori-
gins the Harman Curve are revisited; it is motivated by
the frequency response of loudspeaker playback in a
specific listening room. The necessary measurement
procedures are described in detail. Then, the paper
discusses the applicability of existing targets to spa-
tial audio. Therein, it is possible to embed convincing
spatial room information directly into the production,
thereby calling into question the motivation for incor-
porating a listening room in the headphone target. The
paper concludes with a listening experiment that com-
pares the preference of different target curves for both
spatial audio and stereo
Speakers
AM

Alexander Mülleder

Graz University of Technology
avatar for Nils Meyer-Kahlen

Nils Meyer-Kahlen

Aalto University
Friday May 23, 2025 2:10pm - 2:30pm CEST
C2 ATM Studio Warsaw, Poland

2:15pm CEST

Storytelling in Audio Augmented Reality
Friday May 23, 2025 2:15pm - 3:15pm CEST
How can Audio Augmented Reality (AAR) serve as a storytelling medium? Sound designer Matias Harju shares insights from The Reign Union, an experimental interactive AAR story currently exhibited at WHS Union Theatre in Helsinki, Finland.

This workshop addresses the challenges and breakthroughs of creating an immersive, headphone-based 6DoF AAR experience. In The Reign Union, two simultaneous participants experience the same bio-fictional story from different points of audition. Narrative design considerations and approaches are discussed and demonstrated through video clips featuring binaural sound recorded from the experience. References to other AAR experiences around the world are included to provide a broader context. A central theme is how reality anchors the narrative, while virtual sounds reveal new perspectives and interpretations.

The session also briefly examines the development of an in-house 6DoF AAR prototype platform, used for The Reign Union story as well as other narrative research conducted by the author and his team. This has been a journey through various pose tracking, virtual acoustic, and authoring solutions, resulting in a scalable system potentially suited for complex indoor spaces.

Matias, author of the forthcoming book Audio Augmented Reality: Concepts, Technologies, and Narratives (Routledge, June 2025), invites attendees to discuss and discover the possibilities of AAR as a tool for storytelling and artistic expression.
Speakers
Friday May 23, 2025 2:15pm - 3:15pm CEST
C3 ATM Studio Warsaw, Poland

2:30pm CEST

Investigating Listeners’ Emotional and Physiological Responses to Varying Apparent Width and Horizontal Position of a Single Sound Source
Friday May 23, 2025 2:30pm - 2:50pm CEST
This research aims to explore the impact of variations in apparent sound source width and position on emotional and physiological responses among listeners, with a particular focus on the domain of virtual reality applications. While sound is recognized as a potent elicitor of strong emotions, the specific role of spatial characteristics, such as apparent sound source width, has not been systematically analyzed. The authors’ previous study has indicated that the spatial distribution of sound can alter perceptions of scariness. In contrast, the current study explores whether adjustments in apparent sound source width can significantly affect emotional valence and arousal, as well as human physiological metrics. The objective of this study was to investigate the impact of a single sound source width and its horizontal position on emotional engagement, thereby providing valuable insights for advancements in immersive audio experiences. Our experiments involved conducting listening tests in a spatial sound laboratory, utilizing a circular setup of sixteen loudspeakers to present a range of audio stimuli drawn from five selected recordings. The stimuli were manipulated based on two key parameters: the apparent sound source width and the spatial positioning of the sound source (front, back, left, or right). Participants assessed their emotional reactions using the Self-Assessment Manikin (SAM) pictogram method. Physiological data, including electroencephalogram, blood volume pressure, and electrodermal activity was collected in real-time via wearable sensors consisting of an EEG headset and a finger-attached device.
Friday May 23, 2025 2:30pm - 2:50pm CEST
C1 ATM Studio Warsaw, Poland

2:30pm CEST

Sound Source Directivity Estimation in Spherical Fourier Domain from Sparse Measurements
Friday May 23, 2025 2:30pm - 2:50pm CEST
In recent years, applications such as virtual reality (VR) systems and room acoustics simulations have brought the modeling of sound source directivity into focus. An accurate simulation of directional responses of sound sources is essential in immersive audio applications.

Real sound sources have directional properties that are different from simple sources such as monopoles, which are sources frequently used for modeling more complex acoustic fields. For instance, the sound level of human speech as a sound source varies considerably depending on where the sound is recorded with respect to the talker’s head. The same is true for loudspeakers, which are considered linear and time-independent sources. When the sound is recorded behind the speaker, it is normal to observe differences of up to 20 dB SPL at some frequencies. The directional characteristics of sound sources become particularly pronounced at high frequencies. The propagation of real sound sources, such as human voices or musical instruments, differs from simple source models like monopoles, dipoles, and quadrupoles due to their physical structures.

The common approach to measuring directivity patterns of sound sources involves surrounding a sound source in an anechoic chamber with a high number of pressure microphones on a spherical grid and registering the sound power at these positions. Apart from the prohibitive hardware requirements, such measurement setups are mostly impractical and costly. Audio system manufacturers have developed various methods for measuring sound source directionality over the years. These methods are generally of high technical complexity.

This article proposes a new, reduced-complexity directivity measurement approach based on the spherical harmonic decomposition of the sound field. The method estimates the directional characteristics of sound sources using fewer measurement points with spherical microphone arrays. The spherical harmonic transform allows for the calculation of directivity using data collected from spherical microphone arrays instead of pressure sensors. The proposed method uses both the pressure component and spatial derivatives of the sound field and successfully determines directivity with sparse measurements.

An estimation model based on the spherical Fourier transform was developed, measurements were carried out to test this model, and preliminary results obtained from the estimation model are presented. Experiments conducted at the METU Spatial Audio Research Laboratory demonstrated the effectiveness of the proposed method. The directivity characteristics of Genelec 6010A loudspeaker are measured using eight 3rd-order spherical microphone arrays. The directivity functions obtained were highly consistent with the data provided by the loudspeaker manufacturer. The results, especially in low and mid-frequency bands, indicate the utility of the proposed method.
Friday May 23, 2025 2:30pm - 2:50pm CEST
C2 ATM Studio Warsaw, Poland

2:50pm CEST

A study on reverberation in a virtual acoustic setting using the Lexicon 960L Reverb Processor
Friday May 23, 2025 2:50pm - 3:10pm CEST
This paper describes ongoing research on integrating algorithmic reverberation tools designed for audio post-production into virtual acoustics, focusing on using Impulse Responses (IRs) captured from the legendary Lexicon 960L hardware reverberation unit. While previous research from the McGill University Virtual Acoustics Technology (VAT) Lab has utilized room impulse responses (RIRs) captured from various performance halls to create active acoustic environments in the recording studio, this study analyzes the perceived differences between the two listening environments and the effect of the VATLab speakers and effect of room acoustics on IRs captured from 5.0 multichannel reverb presets. Three of these multichannel IRs have been chosen to simulate a Lexicon 960L “environment” in a physical space.

Objective measurements in McGill University’s Immersive Media Laboratory (IMLAB) Control Room and in VATLab following the ISO 3382 standard measure the effect of the physical room and the omnidirectional dodecahedral speakers used for auralization. Through a subjective pilot study, subjective analysis investigates the perceived differences between the Lexicon IRs in VATLab and a control condition, the IMLAB control room. The results of an attribute rating test on perceived immersion, soundfield continuity, tone color, and overall listening experience between the two spaces helps us better understand how reverberation algorithms designed for multichannel mixing/post-production translate to a virtual acoustics system.
In conclusion, we discuss the perceptual differences between the IMLAB Control Room and VATLab and results of objective measurements.
Speakers
AA

Aybar Aydin

PhD Candidate, McGill University
avatar for Kathleen Zhang

Kathleen Zhang

McGill University
avatar for Richard King

Richard King

Professor, McGill University
Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →
Friday May 23, 2025 2:50pm - 3:10pm CEST
C1 ATM Studio Warsaw, Poland

2:50pm CEST

Perceptual evaluation of professional point and line sources for immersive audio applications
Friday May 23, 2025 2:50pm - 3:10pm CEST
Immersive sound reinforcement aims to create a balanced perception of sounds arriving from different directions, establishing an impression of envelopment over the audience area. Current perceptual research shows that coverage designs featuring nearly constant decay (0dB per distance doubling) preserve the level balance among audio objects in the mix. In contrast, a -3dB decay supports a more uniform sensation of envelopment, especially for off-center listening positions. For practical reasons, point-source loudspeakers remain widely used for immersive audio playback in mid-sized venues. However, point-source loudspeakers inherently decay by -6dB per distance doubling, and using them can conflict with the design goals outlined above. In this paper, we investigate the perceived differences between point-source and line-source setups using eight surrounding loudspeakers side-by-side covering a 10m x 7m audience area. The perceptual qualities of object level balance, spatial definition, and envelopment were compared in a MUSHRA listening experiment, and acoustic measurements were carried out to capture room impulse responses and binaural room impulse responses (BRIRs) of the experimental setup. The BRIRs were used to check whether the results of the listening experiment were reproducible on headphones. Both the loudspeaker and headphone-based experiments delivered highly correlated results. Also, regression models devised based on the acoustic measurements are highly correlated to the perceptual results. The results confirm that elevated line sources, exhibiting a practically realizable decay of -2dB per distance doubling, help preserve object-level balance, increase spatial definition, and provide a uniform envelopment experience throughout the audience area compared to point-source loudspeakers.
Speakers
avatar for Franz Zotter

Franz Zotter

University of Music and Performing Arts Graz
Franz Zotter received an M.Sc. degree in electrical and audio engineering from the University of Technology (TUG) in 2004, a Ph.D. degree in 2009 and a venia docendi in 2023 from the University of Music and Performing Arts (KUG) in Graz, Austria. He joined the Institute of Electronic... Read More →
avatar for Philip Coleman

Philip Coleman

Senior Immersive Audio Research Engineer, L-Acoustics
I'm a research engineer in the L-ISA immersive audio team at L-Acoustics, based in Highgate, London. I'm working on the next generation of active acoustics and object-based spatial audio reproduction, to deliver the best possible shared experiences.Before joining L-Acoustics in September... Read More →
Friday May 23, 2025 2:50pm - 3:10pm CEST
C2 ATM Studio Warsaw, Poland

3:00pm CEST

Analysis and Model of Temporal Sound Attributes from Recorded Audio
Friday May 23, 2025 3:00pm - 3:20pm CEST
A computational framework is proposed for analyzing the temporal evolution of perceptual attributes of sound stimuli. As a paradigm, the perceptual attribute of envelopment, which is manifested in different audio sound reproduction formats, is employed. For this, listener temporal ratings of the envelopment for mono, stereo, and 5.0-channel surround music samples, serve as the ground truth for establishing a computational model that can accurately trace temporal changes from such recordings. Combining established and heuristic methodologies, different features of the audio signals were extracted at each segment that envelopment ratings were registered, named long-term (LT) features. A memory LT computational stage is proposed to account for the temporal variations of the features through the duration of the signal, based on the exponentially weighted moving average of the respective LT features. These are utilized in a gradient tree boosting, machine learning algorithm, leading to a Dynamic Model that accurately predicts the listener’s temporal envelopment ratings. Without the proposed memory LT feature function, a Static Model is also derived, which is shown to have lower performance for predicting such temporal envelopment variations.
Speakers
avatar for Georgios Moiragias

Georgios Moiragias

Department of Electrical and Computer Engineering, University of Patras
I am a graduate of the Electrical and Computer Engineering Department of the University of Patras. Since 2020, I am a PhD candidate in the same department under the supervision of Professor John Mourjopoulos. My research interests include analysis and modeling of perceptual and affective... Read More →
avatar for John Mourjopoulos

John Mourjopoulos

Professor emeritus, University of Patras
John Mourjopoulos is Professor Emeritus at the Department of Electrical and Computer Engineering, University of Patras and a Fellow of the AES. As the head of the Audiogroup for nearly 30 years, he has authored and presented more than 200 journal and conference papers. His research... Read More →
Friday May 23, 2025 3:00pm - 3:20pm CEST
C1 ATM Studio Warsaw, Poland

3:10pm CEST

Detection of spectral component asynchrony: Applying psychoacoustic research to transient phenomena in music
Friday May 23, 2025 3:10pm - 3:30pm CEST
Numerous studies highlight the role of transient behavior in musical sounds and its impact on sound identification. This study compares these findings with established psychoacoustic measurements of detection thresholds for asynchrony in onset and offset transients, obtained using synthesized stimuli that allowed precise control of stimulus parameters. Results indicated that onset asynchrony can be detected at thresholds as low as 1 ms—even half a cycle of the component frequency. In contrast, offset asynchrony detection was found to be less precise, with thresholds ranging from 5 to 10 ms. Sensitivity improves when multiple harmonics are asynchronous. Additionally, component phase significantly influences onset asynchrony detection: at 1000 Hz and above, phase shifts raise thresholds from below 1 ms to around 50 ms, while having little effect on offset detection. Although these findings were based on controlled artificial stimuli, they can provide valuable insight into asynchrony in natural musical sounds. In many cases, detection thresholds are well below the variations observed in music, yet under certain conditions and frequencies, some temporal variations may become not perceptible.
Speakers
Friday May 23, 2025 3:10pm - 3:30pm CEST
C1 ATM Studio Warsaw, Poland

3:15pm CEST

ECHO Project - Immersive Microphone Array Techniques for Orchestral Recording
Friday May 23, 2025 3:15pm - 4:45pm CEST
The ECHO Project (Exploring the Cinematic Hemisphere for Orchestra) is a collaborative research initiative that explores 3D microphone array techniques for orchestral recording, involving eight experts in immersive sound recording: Kellogg Boynton, Anthony Caruso, Hyunkook Lee, Morten Lindberg, Simon Ratcliffe, Katarzyna Sochaczewska, Mark Willsher, and Nick Wollage. Building on the 3D-MARCo initiative, this project aims to provide a platform for sound engineers, composers, researchers, and students to experiment with various immersive recording techniques. To this end, an open-access database of high-quality orchestral recordings was created from a recording session at AIR Studios, London, featuring a Oscar-winning composer Volker Bertelmann and the London Contemporary Orchestra.

The ECHO database includes recordings of four pieces, captured using up to 143 microphone capsules per piece. This setup includes seven different microphone arrays designed by the experts, spot microphones, a dummy head, and a higher-order spherical microphone system. The database allows users to not only compare different techniques but also to experiment with mixing different microphones, helping them develop their own techniques. It also serves as a useful resource for research, teaching and learning in immersive audio.

This workshop will present the rationale behind each microphone array used in the project, detail the recording process, discuss the immersive approach to composition and recording methods, and present some of the recordings in 7.1.4.
Speakers
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
avatar for Katarzyna Sochaczewska

Katarzyna Sochaczewska

Researcher, AGH UST
Immersive Audio Producer - Research in Perception in Spatial Audio——————————I am driven by a passion for making sound experiences unforgettable. My work lies at the intersection of technologyand creativity, where I explore how immersive sound and music can captivate... Read More →
avatar for Morten Lindberg

Morten Lindberg

Producer and Engineer, 2L (Lindberg Lyd, Norway)
Recording Producer and Balance Engineer with 46 GRAMMY-nominations, 38 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020.
Friday May 23, 2025 3:15pm - 4:45pm CEST
C4 ATM Studio Warsaw, Poland

3:20pm CEST

Honeybee sound generation using Machine learning techniques
Friday May 23, 2025 3:20pm - 3:40pm CEST
The Honeybee is an insect known to almost all human beings around the world. The sounds produced by bees is a ubiquitous staple of the soundscape of the countryside and forest meadows, bringing an air of natural beauty to the perceived environment. Honeybee-produced sounds are also an important part of apitherapeutic experiences, where the close-quarters exposure to honeybees proves beneficial to the mental and physical well-being of humans. This research investigates the generation of synthetic honeybee buzzing sounds using Conditional Generative Adversarial Networks (cGANs). Trained on a comprehensive dataset of real recordings collected both inside and outside the beehive during a long-term audio monitoring session. The models produce diverse and realistic audio samples. Two architectures were developed: an unconditional GAN for generating long, high-fidelity audio, and a conditional GAN that incorporates time-of-day information to generate shorter samples reflecting diurnal honeybee activity patterns. The generated audio exhibits both spectral and temporal properties similar to real recordings, as confirmed by statistical analysis performed during the experiment. This research has implications for scientific research in honeybee colony health monitoring as well as apitherapy research. and artistic endeavours, for example in sound design and immersive soundscape creation, the trained generator model is publicly available on the project’s website.
Friday May 23, 2025 3:20pm - 3:40pm CEST
C1 ATM Studio Warsaw, Poland

3:40pm CEST

Moving Sound Source Localization and Tracking based on Envelope Estimation for Unknown Number of Sources
Friday May 23, 2025 3:40pm - 4:00pm CEST
Existing methods for moving sound source localization and tracking face significant challenges when dealing with an unknown number of sound sources, which substantially limits their practical applications. This paper proposes a moving sound source tracking method based on source signal envelopes that does not require prior knowledge of the number of sources. First, an encoder-decoder attractor (EDA) method is used to estimate the number of sources and obtain an attractor for each source, based on which the signal envelope of each source is estimated. This signal envelope is then used as a clue for tracking the target source. The proposed method has been validated through simulation experiments. Experimental results demonstrate that the proposed method can accurately estimate the number of sources and precisely track each source.
Speakers
Friday May 23, 2025 3:40pm - 4:00pm CEST
C1 ATM Studio Warsaw, Poland

3:45pm CEST

A Curvilinear Transfer Function for Wide Dynamic Range Compression With Expansion
Friday May 23, 2025 3:45pm - 4:05pm CEST
Wide Dynamic Range Compression in hearing aids is becoming increasingly more complex as the number of channels and adjustable parameters grow. At the same time, there is growing demand for customization and user self-adjustment of hearing aids, necessitating a balance between complexity and user accessibility. Compression in hearing aids is governed by the input-output transfer function, which relates input magnitude to output magnitude, and is typically defined as a combination of linear piecewise segments resembling logarithmic behavior. This work presents an alternative to the conventional compression transfer function that consolidates multiple compression parameters and revisits expansion in hearing aids. The
curvilinear transfer function is a continuous curve with logarithm-like behavior, governed by two parameters—gain and compression ratio. Experimental results show that curvilinear compression reduces the amplification of low-level noise, improves signal-to-noise ratio by up to 1.0 dB, improves sound quality as measured by the Hearing Aids Speech Quality Index by up to 6.7%, and provides comparable intelligibility as measured by the Hearing Aids Speech Perception Index, with simplified parameterization compared to conventional compression.
The consolidated curvilinear transfer function is highly applicable to over-the-counter hearing aids and offers more capabilities for customization than current prominent over-the-counter and self-adjusted hearing aids.
Friday May 23, 2025 3:45pm - 4:05pm CEST
C2 ATM Studio Warsaw, Poland

4:00pm CEST

Room Geometry Inference Using Localization of the Sound Source and Its Early Reflections
Friday May 23, 2025 4:00pm - 4:20pm CEST
Traditional methods for inferring room geometry from sound signals are predominantly based on Room Impulse Response (RIR) or prior knowledge of the sound source location. This significantly restricts the applicability of these approaches. This paper presents a method for estimating room geometry based on the localization of direct sound source and its early reflections from First-Order Ambisonics (FOA) signals without the prior knowledge of the environment. First, this method simultaneously estimates the Direction of Arrival (DOA) of the direct source and the detected first-order reflected sources. Then, a Cross-attention-based network for implicitly extracting the features related to Time Difference of Arrival (TDOA) between the direct source source and the first-order reflected sources is proposed to estimate the distances of the direct and the first-order reflected sources. Finally, the room geometry is inferred from the localization results of the direct and the first-order reflected sources. The effectiveness of the proposed method was validated through simulation experiments. The experimental results demonstrate that the method proposed achieves accurate localization results and performs well in inference of room geometry.
Speakers
Friday May 23, 2025 4:00pm - 4:20pm CEST
C1 ATM Studio Warsaw, Poland

4:00pm CEST

Beyond Stereo: Using Binaural Audio to Bridge Legacy and Modern Sound Systems
Friday May 23, 2025 4:00pm - 5:30pm CEST
As immersive audio content becomes more prevalent across streaming and broadcast platforms, creators and engineers face the challenge of making spatial audio accessible to listeners using legacy codecs and traditional playback systems, particularly headphones. With multiple binaural encoding methods available, choosing the right approach for a given project can be complex.

This workshop is designed as an exploration for audio professionals to better understand the strengths and applications of various binaural encoding systems. By comparing different techniques and their effectiveness in real-world scenarios, attendees will gain insights into how binaural processing can serve as a bridge between legacy and modern formats, preserving spatial cues while maintaining compatibility with existing distribution channels.

As the first in a series of workshops, this session will help define key areas for real-world testing between this convention and the next. Attendee insights and discussions will directly influence which encoding methods are explored further, ensuring that the most effective solutions are identified for different content types and delivery platforms.

Participants will gain an understanding of processing methods, and implementation strategies for various distribution platforms. By integrating these approaches, content creators can enhance accessibility and ensure that immersive audio reaches a wider audience, possibly encouraging consumers to explore how to enjoy immersive content using a variety of playback systems.
Speakers
avatar for Alex Kosiorek

Alex Kosiorek

Manager / Executive Producer / Sr. Engineer, Central Sound at Arizona PBS
Multi-Emmy Award Winning Senior Audio Engineer, Executive Producer, Media Executive, Surround, Immersive, and Acoustic Music Specialist. 30+ years of experience creating audio-media productions for broadcast and online distribution. Known for many “firsts” such as 1st audio fellow... Read More →
Friday May 23, 2025 4:00pm - 5:30pm CEST
C3 ATM Studio Warsaw, Poland

4:00pm CEST

Binamix - A Python Library for Generating Binaural Audio Datasets
Friday May 23, 2025 4:00pm - 5:30pm CEST
The increasing demand for spatial audio in applications such as virtual reality, immersive media, and spatial audio research necessitates robust solutions for binaural audio dataset generation for testing and validation. Binamix is an open-source Python library designed to facilitate programmatic binaural mixing using the extensive SADIE II Database, which provides HRIR and BRIR data for 20 subjects. The Binamix library provides a flexible and repeatable framework for creating large-scale spatial audio datasets, making it an invaluable resource for codec evaluation, audio quality metric development, and machine learning model training. A range of pre-built example scripts, utility functions, and visualization plots further streamline the process of custom pipeline creation. This paper presents an overview of the library's capabilities, including binaural rendering, impulse response interpolation, and multi-track mixing for various speaker layouts. The tools utilize a modified Delaunay triangulation technique to achieve accurate HRIR/BRIR interpolation where desired angles are not present in the data. By supporting a wide range of parameters such as azimuth, elevation, subject IRs, speaker layouts, mixing controls, and more, the library enables researchers to create large binaural datasets for any downstream purpose. Binamix empowers researchers and developers to advance spatial audio applications with reproducible methodologies by offering an open-source solution for binaural rendering and dataset generation.
Speakers
avatar for Jan Skoglund

Jan Skoglund

Google
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
Friday May 23, 2025 4:00pm - 5:30pm CEST
Hall F ATM Studio Warsaw, Poland

4:00pm CEST

Neural 3D Audio Renderer for acoustic digital twin creation
Friday May 23, 2025 4:00pm - 5:30pm CEST
In this work, we introduce a Neural 3D Audio Renderer (N3DAR) - a conceptual solution for creating acoustic digital twins of arbitrary spaces. We propose a workflow that consists of several stages including:
1. Simulation of high-fidelity Spatial Room Impulse Responses (SRIR) based on the 3D model of a digitalized space,
2. Building an ML-based model of this space for interpolation and reconstruction of SRIRs,
3. Development of a real-time 3D audio renderer that allows the deployment of the digital twin of a space with accurate spatial audio effects consistent with the actual acoustic properties of this space.
The first stage consists of preparation of the 3D model and running the SRIR simulations using the state-of-the-art wave-based method for arbitrary pairs of source-receiver positions. This stage provides a set of learning data being used in the second stage - training the SRIR reconstruction model. The training stage aims to learn the model of the acoustic properties of the digitalized space using the Acoustic Volume Rendering approach (AVR). The last stage is the construction of a plugin with a dedicated 3D audio renderer where rendering comprises reconstruction of the early part of the SRIR, estimation of the reverb part, and HOA-based binauralization.
N3DAR allows the building of tailored audio rendering plugins that can be deployed along with visual 3D models of digitalized spaces, where users can freely navigate through the space with 6 degrees of freedom and experience high-fidelity binaural playback in real time.
We provide a detailed description of the challenges and considerations for each of the stages. We also conduct an extensive evaluation of the audio rendering capabilities with both, objective metrics and subjective methods using a dedicated evaluation platform.
Friday May 23, 2025 4:00pm - 5:30pm CEST
Hall F ATM Studio Warsaw, Poland

4:00pm CEST

Performance Estimation Method for 3D Microphone Array based on the Modified Steering Vector in Spherical Harmonic Domain
Friday May 23, 2025 4:00pm - 5:30pm CEST
This paper presents an objective method for estimating the performance of 3D microphone arrays, which is also applicable to 2D arrays. The method incorporates the physical characteristics and relative positions of the microphones, merging these elements through a weighted summation to derive the arrays' directional patterns. These patterns are represented as a "Modified Steering Vector." Additionally, leveraging the spatial properties of spherical harmonics, we transform the array's directional pattern into the spherical harmonic domain. This transformation enables a quantitative analysis of the physical properties of each component, providing a comprehensive understanding of the array's performance. Overall, the proposed method offers a deeply insightful and versatile framework for evaluating the performance of both 2D and 3D microphone arrays by fully exploiting their inherent physical characteristics.
Friday May 23, 2025 4:00pm - 5:30pm CEST
Hall F ATM Studio Warsaw, Poland

4:00pm CEST

Reconstructing Sound Fields with Physics-Informed Neural Networks: Applications in Real-World Acoustic Environments
Friday May 23, 2025 4:00pm - 5:30pm CEST
The reconstruction of sound fields is a critical component in a range of applications, including spatial audio for augmented, virtual, and mixed reality (AR/VR/XR) environments, as well as for optimizing acoustics in physical spaces. Traditional approaches to sound field reconstruction predominantly rely on interpolation techniques, which estimate sound fields based on a limited number of spatial and temporal measurements. However, these methods often struggle with issues of accuracy and realism, particularly in complex and dynamic environments. Recent advancements in deep learning have provided promising alternatives, particularly with the introduction of Physics-Informed Neural Networks (PINNs), which integrate physical laws directly into the model training process. This study aims to explore the application of PINNs for sound field reconstruction, focusing on the challenge of predicting acoustic fields in unmeasured areas. The experimental setup involved the collection of impulse response data from the Promenadikeskus concert hall in Pori, Finland, using various source and receiver positions. The PINN framework is then utilized to simulate the hall’s acoustic behavior, with parameters incorporated to model sound propagation across different frequencies and source-receiver configurations. Despite challenges arising from computational load, pre-processing strategies were implemented to optimize the model's efficiency. The results demonstrate that PINNs can accurately reconstruct sound fields in complex acoustic environments, offering significant potential for real-time sound field control and immersive audio applications.
Speakers
RK

Rigas Kotsakis

Aristotle University of Thessaloniki
avatar for Nikolaos Vryzas

Nikolaos Vryzas

Aristotle University Thessaloniki
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
LV

Lazaros Vrysis

Aristotle University of Thessaloniki
Friday May 23, 2025 4:00pm - 5:30pm CEST
Hall F ATM Studio Warsaw, Poland

4:00pm CEST

Recording and post-production of Dietrich Buxtehude baroque cantatas in stereo and Dolby Atmos using experimental 3D microphone array.
Friday May 23, 2025 4:00pm - 5:30pm CEST
3D recordings seem to be an attractive solution when trying to achieve the immersion effect. Recently, Dolby Atmos is an increasingly popular format for distributing three-dimensional music recordings. Although currently the main format for producing music recordings is still stereophony.

How to optimally extend traditional microphone techniques when recording classical music to obtain both stereo recordings and three-dimensional formats (e.g. Dolby Atmos) in the post-production process? The author is trying to answer this question using the example of a recording of Dietrich Buxtehude work "Membra Jesu Nostri", BuxWV 75. The cycle of seven cantatas composed in 1680 is one of the most important and most popular compositions of the early Baroque era. The first Polish recording was made by the Arte Dei Suonatori conducted by Bartłomiej Stankowiak, accompanied by soloists and choral parts performed by the choir Cantus Humanus.

The author will present his concept of a set of microphones for 3D recordings. In addition to the detailed setup of microphones, it will cover the method of post-production of the recording, combining stereo with a mix of the recording into the Dolby Atmos system in a 7.2.4 speaker configuration. A workflow will be proposed to facilitate the change between different formats.
Friday May 23, 2025 4:00pm - 5:30pm CEST
Hall F ATM Studio Warsaw, Poland

4:00pm CEST

Subjective Evaluation on Three-dimensional VBAP and Ambisonics in an Immersive Concert Setting
Friday May 23, 2025 4:00pm - 5:30pm CEST
This paper investigates the subjective evaluation of two prominent three-dimensional spatialization techniques—Vector Base Amplitude Panning (VBAP) and High-Order Ambisonics (HOA)—using IRCAM’s Spat in an immersive concert setting. The listening test was conducted in the New Hall at the Royal Danish Academy of Music, which features a 44-speaker immersive audio system. The musical stimuli included electronic compositions and modern orchestral recordings, providing a diverse range of temporal and spectral content. The participants comprised experienced Tonmeisters and non-experienced musicians, who were seated in off-center positions to simulate real-world audience conditions. This study provides an ecologically valid subjective evaluation methodology.
The results indicated that VBAP excelled in spatial clarity and sound quality, while HOA demonstrated superior envelopment. The perceptual differences between the two techniques were relatively minor, influenced by room acoustics and suboptimal listening positions. Furthermore, music genre had no significant impact on the evaluation outcomes.
The study highlights VBAP’s strength in precise localization and HOA's capability for creating immersive soundscapes, aiming to bridge the gap between ideal and real-world applications in immersive sound reproduction and perception. The findings suggest the need to balance trade-offs when selecting spatialization techniques for specific purposes, venues, and audience positions. Future research will focus on evaluating a wider range of spatialization methods in concert environments and optimizing them to improve the auditory experience for distributed audiences.
Speakers
avatar for Jesper Andersen

Jesper Andersen

Head of Tonmeister Programme, Det Kgl Danske Musikkonservatorium
As a Grammy-nominated producer, engineer and pianist Jesper has recorded around 100 CDs and produced music for radio, TV, theatre, installations and performance. Jesper has also worked as a sound engineer/producer at the Danish Broadcasting Corporation.A recent album-production is... Read More →
avatar for Stefania Serafin

Stefania Serafin

Professor, Aalborg University Copenhagen
I am Professor in Sonic interaction design at Aalborg University in Copenhagen and leader of the Multisensory Experience Labtogether with Rolf Nordahl.I am the President of the Sound and Music Computing association, Project Leader of the Nordic Sound and Music Computing netwo... Read More →
Friday May 23, 2025 4:00pm - 5:30pm CEST
Hall F ATM Studio Warsaw, Poland

4:00pm CEST

Visualization of the spatial behavior between channels in surround program
Friday May 23, 2025 4:00pm - 5:30pm CEST
#N/A
Speakers
Friday May 23, 2025 4:00pm - 5:30pm CEST
Hall F ATM Studio Warsaw, Poland

4:05pm CEST

Tiresias - An Open-Source Hearing Aid Development Board
Friday May 23, 2025 4:05pm - 4:25pm CEST
Hearing loss is a global public health issue due to its high prevalence and negative impact on various aspects of one’s life, including well being and cognition. Despite their crucial role in auditory rehabilitation, hearing aids remain inaccessible to many due to their high costs, particularly in low- and middle-income countries. Existing open-source solutions often rely on high-power, bulky platforms rather than compact, low-power wearables suited for real-world applications. This work introduces Tiresias, an open-source hearing aid development board designed for real-time audio processing using low-cost electronics. Integrating key hearing aid functionalities into a compact six-layer printed circuit board (PCB), Tiresias features multichannel compression, digital filtering, beamforming, Bluetooth connectivity, and physiological data monitoring, fostering modularity and accessibility through publicly available hardware and firmware resources based on the Nordic nRF Connect and Zephyr real-time operating system (RTOS). By addressing technological and accessibility challenges, this work advances open-source hearing aid development, enabling research in hearing technologies, while also supporting future refinements and real-world validation.
Friday May 23, 2025 4:05pm - 4:25pm CEST
C2 ATM Studio Warsaw, Poland

4:30pm CEST

Ask Us Anything About Starting Your Career
Friday May 23, 2025 4:30pm - 6:00pm CEST
Join a panel of professionals from a variety of fields in the industry as we discuss topics including how to enter the audio industry, how they each got started in their own careers and the path their careers took, and give advice geared towards students and recent graduates. Bring your questions for the panelists – most of this workshop will be focused the information YOU want to hear!
Speakers
avatar for Ian Corbett

Ian Corbett

Coordinator & Professor, Audio Engineering & Music Technology, Kansas City Kansas Community College
Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates off-beat-open-hats LLC, providing live sound, recording, and audio production services to clients in the Kansas City area... Read More →
Friday May 23, 2025 4:30pm - 6:00pm CEST
Hall F ATM Studio Warsaw, Poland
  Audio in education

4:40pm CEST

Comparing Human and Machine Ensemble Width Estimation in Binaural Music Recordings under Simulated Anechoic Conditions
Friday May 23, 2025 4:40pm - 5:00pm CEST
In recent years, there has been an increasing interest in binaural technology due to its ability to create immersive spatial audio experiences, particularly in streaming services and virtual reality applications. While audio localization studies typically focus on individual sound sources, ensemble width (EW) is crucial for scene-based analysis, as wider ensembles enhance immersion. We define intended EW as the angular span between the outermost sound sources in an ensemble, controlled during binaural synthesis. This study presents a comparison between human perception of EW and its automatic estimation under simulated anechoic conditions. Fifty-nine participants, including untrained listeners and experts, took part in listening tests, assessing 20 binaural anechoic excerpts synthesized using 2 publicly available music recordings, 2 different HRTFs, and 5 distinct EWs (0° to 90°). The excerpts were played twice in random order via headphones through a web-based survey. Only a subset of ten listeners, of which nine were experts, passed post-screening tests, with a mean absolute error (MAE) of 74.62° (±38.12°), compared to MAE of 5.92° (±0.14°) achieved a by pre-trained machine learning method using auditory modeling and gradient-boosted decision trees. This shows that while intended EW can be algorithmically extracted from synthesized recordings, it significantly differs from human perception. Participants reported insufficient externalization, front-back confusion (suggesting HRTF mismatch). The untrained listeners demonstrated response inconsistencies and a low degree of discriminability, which led to the rejection of most untrained listeners during post-screening. The findings may contribute to the development of perceptually aligned EW estimation models.
Speakers
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
Friday May 23, 2025 4:40pm - 5:00pm CEST
C1 ATM Studio Warsaw, Poland

5:00pm CEST

Data-driven estimation of traditional frame drum construction specifications
Friday May 23, 2025 5:00pm - 5:20pm CEST
This research aims to provide a systematic approach for the analysis of geometrical and material characteristics of traditional frame drums using deep learning. A data-driven approach is used, integrating supervised and unsupervised feature extraction techniques to associate measurable audio features with perceptual attributes. The methodology involves the training of convolutional neural networks on Mel-Scale spectrograms to estimate wood type (classification), diameter (regression), and depth (regression). A multi-labeled dataset containing recorded samples of frame drums of different specifications is used for model training and evaluation. Hierarchical classification is explored, incorporating playing techniques and environmental factors. Handcrafted features enhance interpretability, helping determine the impact of construction attributes on sound perception, ultimately aiding instrument design. Data augmentation techniques, including pitch alterations, additive noise, etc. are introduced to expand the generalization of the approach and dataset expansion.
Speakers
avatar for Nikolaos Vryzas

Nikolaos Vryzas

Aristotle University Thessaloniki
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
Friday May 23, 2025 5:00pm - 5:20pm CEST
C1 ATM Studio Warsaw, Poland

5:00pm CEST

Exploring Temporal Properties of Closely Delayed Signals in Immersive Music Production: Psychoacoustic and Spatial Perception Considerations
Friday May 23, 2025 5:00pm - Sunday May 25, 2025 6:00pm CEST
*Introduction
With the growing market of immersive audio, both new and exciting production possibilities are emerging, alongside the resurfacing of existing surround sound production techniques. As audio production continues to evolve, understanding the impact of temporal properties on spatial perception becomes increasingly critical. One of the most effective ways to create a sense of space and depth, as well as to enhance listener envelopment, is through precise manipulation of temporal characteristics of sound.

*Temporal Adjustments in Audio Production
In stereophonic recording techniques, spatialization is often achieved by carefully controlling both each microphone’s distance from the sound source and the distance between microphones, in conjunction with leveraging variations in microphone sensitivity through polar patterns and directional rejection.
These distance-based variations introduce time delays, which are fundamental to spatial localization and depth perception. Similarly, in post-production workflows, delaying and applying differentiated effects to signals serve as powerful tools for enhancing immersion and spatiality. The controlled use of delay, reflections, and micro-temporal variations plays a significant role in shaping perceived auditory space. These techniques are widely used in both as mixing approaches with music and also sound design where artificially introducing delays helps simulate the propagation of sound in physical spaces, creating a more authentic and immersive auditory experience.

*Psychoacoustic Phenomena and Spatial Perception
Closely delayed or slightly altered signals give rise to psychoacoustic effects that influence spatial perception rather than purely temporal perception.
For instance, the number, spectral characteristics, and temporal distribution of reflections can lead a listener to perceive an auditory environment akin to a concert hall, even in the absence of an actual reverberant space.
The well-known Haas effect (precedence effect) provides insights into how human perception prioritizes the first-arriving sound over subsequent delayed versions, influencing localization and clarity. Additionally, the concepts of Temporal Integration Window (auditory signal fusion) describe how multiple signals originating from the same source are perceptually fused into a single event, affecting spatial coherence and envelopment.

*Workshop and Study Overview
This workshop presents and exemplifies findings from an ongoing semester-long study, which is currently being prepared as a submission to the Journal of the Audio Engineering Society. The study investigates whether sensation, timbral perception, and temporal integration windows are influenced when the delayed signal's spatial position is altered. By showcasing how spatial modifications of delayed signals affect auditory perception, the workshop aims to contribute insights to the field of immersive audio production.

*Conclusion
This research underscores the importance of temporal manipulation in immersive audio, bridging psychoacoustics with production techniques. By examining spatial perception through the lens of delay-based processing, the study offers new perspectives on designing more effective immersive sound experiences. The workshop will provide participants with theoretical insights and practical examples, encouraging further exploration of the intersection between temporal properties and spatial audio design.
Speakers
avatar for Can Murtezaoglu

Can Murtezaoglu

Research Assistant, Istanbul Technical University
Immersive audio recording and mixing techniques, audio design for visual media
Friday May 23, 2025 5:00pm - Sunday May 25, 2025 6:00pm CEST
C4 ATM Studio Warsaw, Poland

5:15pm CEST

Key Technology Briefings 3
Friday May 23, 2025 5:15pm - 6:00pm CEST
Friday May 23, 2025 5:15pm - 6:00pm CEST
C1 ATM Studio Warsaw, Poland

5:20pm CEST

Automatic generation of music captions
Friday May 23, 2025 5:20pm - 5:40pm CEST
This paper discusses the process of generating natural language music descriptions, called captioning, using deep learning and large language models. A novel encoder architecture is trained to learn large-scale music representations and generate high-quality embeddings, which a pre-trained decoder then uses to generate captions. The captions used for training are from the state-of-the-art LP-MusicCaps dataset. A qualitative and subjective assessment of the quality of created captions is performed, showing the difference between various decoder models.
Friday May 23, 2025 5:20pm - 5:40pm CEST
C1 ATM Studio Warsaw, Poland
 
Saturday, May 24
 

9:00am CEST

Strategies for Obtaining True Quasi-Anechoic Loudspeaker Response Measurements
Saturday May 24, 2025 9:00am - 9:20am CEST
Simple truncation of the reflections in the impulse response of loudspeakers measured in normal rooms will increasingly falsify the response below about 500 Hz for typical situations. Well-known experience and guidance from loudspeaker models allow the determination of the lowest frequency for which truncation suffices. This paper proposes two additional strategies for achieving much improved low-frequency responses that are complementary to the easily-obtained high-frequency response: (a) a previously published nearfield measurement which can be diffractively transformed to a farfield response with appropriate calculations, here presented with greatly simplified computations, and (b) a measurement setup that admits only a single floor reflection which can be iteratively corrected at low frequencies. Theory and examples of each method are presented.
Speakers
Saturday May 24, 2025 9:00am - 9:20am CEST
C1 ATM Studio Warsaw, Poland

9:00am CEST

A new one-third-octave-band noise criteria
Saturday May 24, 2025 9:00am - 9:20am CEST
A new one-third-octave-band noise criteria (NC) rating method is presented. one-third-octave-band NC curves from NC 70 to NC 0 are derived from the existing octave-band curves, adjusted for bandwidth, fit to continuous functions, and redistributed progressively over this space. This synthesis is described in detail. The diffuse field hearing threshold at low frequencies is also derived. Several NC curves at high frequencies are shown to be below threshold (inaudible). NC ratings are calculated using both the new one-third-octave-band and the legacy octave-band methods for a number of different room noise spectra. The resulting values were found to be similar for both methods. NC ratings using the new method are particularly applicable to very low noise level critical listening environments such as recording studios, scoring stages, and cinema screening rooms, but are shown to also be applicable to higher noise level environments. The proposed method better tracks the audibility of noise at low levels as well as the audibility of tonal noise components, while the legacy method as originally conceived generally emphasizes speech interference.
Saturday May 24, 2025 9:00am - 9:20am CEST
C2 ATM Studio Warsaw, Poland

9:00am CEST

Creating and distributing immersive audio: from IRCAM Spat to Acoustic Objects
Saturday May 24, 2025 9:00am - 10:00am CEST
In this session, we propose a path for the evolution of immersive audio technology towards accelerating commercial deployment and enabling rich user-end personalization, in any linear or interactive entertainment or business application. We review an example of perceptually based immersive audio creation platform, IRCAM Spat, which enables plausible aesthetically motivated immersive music creation and performance, with optional dependency on physical modeling of an acoustic environment. We advocate to alleviate ecosystem fragmentation by showing: (a) how a universal device-agnostic immersive audio rendering model can support the creation and distribution of both physics-driven interactive audio experiences and artistically motivated immersive audio content; (b) how object-based immersive linear audio content formats can be extended, via the notion of Acoustic Objects, to support end-user interaction, reverberant object substitution, or 6-DoF navigation.
Speakers
avatar for Jean-Marc Jot

Jean-Marc Jot

Founder and Principal, Virtuel Works LLC
Spatial audio and music technology expert and innovator. Virtuel Works provides audio technology strategy, IP creation and licensing services to help accelerate the development of audio and music spatial computing technology and interoperability solutions.
avatar for Thibaut Carpentier

Thibaut Carpentier

STMS Lab - IRCAM, SU, CNRS, Ministère de la Culture
Thibaut Carpentier studied acoustics at the École centrale and signal processing at Télécom ParisTech, before joining the CNRS as a research engineer. Since 2009, he has been a member of the Acoustic and Cognitive Spaces team in the STMS Lab (Sciences and Technologies of Music... Read More →
Saturday May 24, 2025 9:00am - 10:00am CEST
C4 ATM Studio Warsaw, Poland

9:00am CEST

Key Technology Briefing 4
Saturday May 24, 2025 9:00am - 10:30am CEST
Saturday May 24, 2025 9:00am - 10:30am CEST
C3 ATM Studio Warsaw, Poland

9:00am CEST

Tutorial Workshop: The Gentle Art of Dithering
Saturday May 24, 2025 9:00am - 10:45am CEST
This tutorial is for everyone working on the design or production of digital audio and should benefit beginners and experts. We aim to bring this topic to life with several interesting audio demonstrations, and up to date with new insights and some surprising results that may reshape pre-conceptions of high resolution.
In a recent paper, we stressed that transparency (high-resolution audio fidelity) depends on the preservation of micro-sounds – those small details that are easily lost to quantization errors, but which can be perfectly preserved by using the right dither.
It is often asked: ‘Why should I add noise to my recording?’ or, ‘How can adding noise make things clearer?’ This tutorial gives a tour through these questions and presents a call to action: dither should not be looked on as an added noise, but an essential lubricant to preserves naturalness.

Tutorial topics include: fundamentals of dithering; analysis using histograms and synchronous averaging; what happens if undithered quantizers are cascaded?; ‘washboard distortion’; noise-shaping; additive and subtractive dither; time-domain effects; inside A/D and D/A converters; the perilous world of modern signal chains (including studio workflow and DSP in fixed and floating-point processors) and, finally, audibility analysis.
Saturday May 24, 2025 9:00am - 10:45am CEST
Hall F ATM Studio Warsaw, Poland

9:00am CEST

Extension of Reflection-Free Region for Loudspeaker Measurements
Saturday May 24, 2025 9:00am - 10:45am CEST
If loudspeaker measurements are carried out elevated over a flat, very reflective surface with no nearby obstacles, the recovered impulse response will contain the direct response and one clean delayed reflection. Many loudspeakers are omnidirectional at low frequencies, having a clear acoustic centre, and this reflection will have a low-frequency behaviour that is essentially the same as its direct response, except the amplitude will be down by a 1/r factor. We derive a simple algorithm that iteratively allows this reflection to be cancelled, so that the response of the loudspeaker will be valid to lower frequencies than before, complementing the usual high-frequency response obtained from simple time-truncation of the impulse response. The method is explained, discussed, and illustrated with a two-way system measured over a flat, sealed driveway surface.
Speakers
Saturday May 24, 2025 9:00am - 10:45am CEST
Hall F ATM Studio Warsaw, Poland

9:00am CEST

Impact of Voice-Coil Temperature on Electroacoustic Parameters for Optimized Loudspeaker Enclosure Design in Small-Signal Response
Saturday May 24, 2025 9:00am - 10:45am CEST
The study of electroacoustic parameters in relation to loudspeaker temperature has predominantly focused on large-signal conditions (i.e., high-power audio signals), with limited attention to their behavior under small-signal conditions at equivalent thermal states. This research addresses this gap by investigating the influence of voice-coil temperature on electroacoustic parameters during small-signal operation. The frequency response of the electrical input impedance and the radiated acoustic pressure were measured across different voice-coil temperatures. The results revealed temperature-dependent shifts across all parameters, including the natural frequency in free air (fₛ), mechanical quality factor (Qₘₛ), electrical resistance (Rₑ), electrical inductance (Lₑ), and equivalent compliance volume (Vₐₛ), among others. Specifically, Rₑ and Lₑ increased linearly with temperature, while fₛ decreased and Vₐₛ increased following power-law functions. These changes suggest that thermal effects influence both electrical and mechanical subsystems, potentially amplified by the viscoelastic “creep” effect inherent to loudspeaker suspensions. Finally, simulations of sealed and bandpass enclosures demonstrated noticeable shifts in acoustic performance under thermal variations, emphasizing the importance of considering temperature effects in enclosure design.
Saturday May 24, 2025 9:00am - 10:45am CEST
Hall F ATM Studio Warsaw, Poland

9:00am CEST

Material Characterization and Variability in Loudspeaker Membranes for Acoustic Modeling
Saturday May 24, 2025 9:00am - 10:45am CEST
Finite Element Method (FEM) simulations are vital in the design of loudspeakers, offering a more efficient alternative to traditional trial-and-error approaches. Precise material characterization, however, is essential in ensuring that theoretical models align closely with measurements. Variations in material properties, particularly those of a loudspeaker’s membrane, can significantly influence loudspeaker performance. This work aims to establish a methodology for evaluating the variability of loudspeaker membrane materials, specifically cones and surrounds, to better understand each materials repeatability among samples, and overall improve the precision and reliability of loudspeaker simulations.


The study first conducts an in-depth analysis of membrane materials, focusing on their Young’s modulus and density, by utilizing both empirical and simulated data. Subsequently, complete loudspeakers were built and investigated, utilizing membranes studied. A FEM simulation framework is presented, and observations are made into discrepancies between measured and simulated loudspeaker responses at specific frequencies and their relation to material modeling.

The results demonstrated significant alignment between simulations and real-life performances, showing interesting insights into the impact of small changes in material properties on the acoustic response of a loudspeaker. One significant finding was the frequency dependence of the Young’s modulus of fiberglass used for a cone. Further validation can be achieved by expanding the dataset of the materials measured, exploring more materials, and under varying conditions such as temperature and humidity. Such insights enable more accurate modeling of loudspeakers and lay the groundwork for exploring novel materials with enhanced acoustic properties, guiding the development of high-performance loudspeakers.
Speakers
avatar for Chiara Corsini

Chiara Corsini

R&D engineer, FAITAL [ALPS ALPINE]
Chiara has joined Faital S.p.A. in 2018, working as a FEM analyst in the R&D Department. Her research activities are focused on thermal phenomena associated with loudspeaker functioning, and mechanical behavior of the speaker moving parts. To this goal, she uses FEM and lumped parameter... Read More →
LV

Luca Villa

FAITAL [ALPS ALPINE]
RT

Romolo Toppi

FAITAL [ALPS ALPINE]
Saturday May 24, 2025 9:00am - 10:45am CEST
Hall F ATM Studio Warsaw, Poland

9:00am CEST

Shape Optimization of Waveguides for Improving the Directivity of Soft Dome Tweeters
Saturday May 24, 2025 9:00am - 10:45am CEST
This paper introduces a new algorithm for multiposition mixed-phase equalization of slot-loaded loudspeaker responses obtained in the horizontal and vertical plane, using finite impulse response (FIR) filters. The algorithm selects a {\em prototype response} that yields a filter that best optimizes a time-domain-based objective metric for equalization for a given direction. The objective metric includes a weighted linear combination of pre-ring energy, early and late reflection energy, and decay rate (characterizing impulse response shortening) during filter synthesis. The results show that the presented mixed-phase multiposition filtering algorithm performs a good equalization along all horizontal directions and for most positions in the vertical direction. Beyond the multiposition filtering capabilities, the algorithm and the metric are suitable for designing mixed-phase filters with low delays, an essential constraint for real-time processing.
Saturday May 24, 2025 9:00am - 10:45am CEST
Hall F ATM Studio Warsaw, Poland

9:00am CEST

Supervised Machine Learning for Quality Assurance in Loudspeakers: Time Distortion Analysis
Saturday May 24, 2025 9:00am - 10:45am CEST
Measuring a speaker’s ability to respond to an instantaneous pulse of energy will result in distortion at its output. Factors such as speaker geometry, material properties, equipment error, and the conditions of the environment will create artifacts within the captured data. This paper explores the extraction of time-domain features from these responses, and the training of a predictive model to allow for classification and rapid quality assurance.
Speakers
Saturday May 24, 2025 9:00am - 10:45am CEST
Hall F ATM Studio Warsaw, Poland

9:20am CEST

IMPro -- Method for Integrated Microphone Pressure Frequency Response Measurement Using a Probe Microphone
Saturday May 24, 2025 9:20am - 9:40am CEST
We propose a practical method for the measurement of the pressure sensitivity frequency response of a microphone that has been integrated into product mechanics. The method uses a probe microphone to do determine the sound pressure entering the inlet of the integrated microphone. We show that the measurements can be performed in a normal office environment as well as in anechoic conditions. The method is validated with measurement of a rigid spherical microphone prototype having analytically defined scattering characteristics. Our results indicate that the proposed method, called IMPro, can effectively measure the pressure sensitivity frequency response of microphones in commercial products, quite independent of the measurement environment.
Saturday May 24, 2025 9:20am - 9:40am CEST
C1 ATM Studio Warsaw, Poland

9:20am CEST

Mixed-Phase Equalization of Slot-loaded Impulse Responses
Saturday May 24, 2025 9:20am - 9:40am CEST
This paper introduces a new algorithm for multiposition mixed-phase equalization of slot-loaded loudspeaker responses obtained in the horizontal and vertical plane, using finite impulse response (FIR) filters. The algorithm selects a {\em prototype response} that yields a filter that best optimizes a time-domain-based objective metric for equalization for a given direction. The objective metric includes a weighted linear combination of pre-ring energy, early and late reflection energy, and decay rate (characterizing impulse response shortening) during filter synthesis. The results show that the presented mixed-phase multiposition filtering algorithm performs a good equalization along all horizontal directions and for most positions in the vertical direction. Beyond the multiposition filtering capabilities, the algorithm and the metric are suitable for designing mixed-phase filters with low delays, an essential constraint for real-time processing.
Speakers
avatar for Sunil Bharitkar

Sunil Bharitkar

Samsung Research America
Saturday May 24, 2025 9:20am - 9:40am CEST
C2 ATM Studio Warsaw, Poland

9:40am CEST

Non-invasive sound field sensing in enclosures using acousto-optics
Saturday May 24, 2025 9:40am - 10:00am CEST
It is challenging to characterize sound across space, especially in small enclosed volumes, using conventional microphone arrays.
This study explores acousto-optic sensing methods to record the sound field throughout an enclosure, including regions close to a source and boundaries.
The method uses a laser vibrometer to sense modulations of the refractive index in air, caused by the propagating sound pressure waves.
Compared to microphone arrays, the sound field can be measured non-invasively and at high resolution which is particularly attractive at high frequencies, in enclosures of limited size or unfavorable mounting conditions for fixtures.
We compensate for vibrations that contaminate and conceal the acousto-optic measurements and employ an image source model to also reconstruct early parts of the impulse response.
The results demonstrate that acousto-optic measurements can enable the analysis of sound field in enclosed spaces non-invasively and with high resolution.
Saturday May 24, 2025 9:40am - 10:00am CEST
C1 ATM Studio Warsaw, Poland

9:40am CEST

Analog Pseudo Leslie Effect with High Grade of Repeatability
Saturday May 24, 2025 9:40am - 10:00am CEST
This paper describes the design of an Analog Stomp Box capable of reproducing the effect observed when a loudspeaker is rotated during operation, the so-called Leslie effect. When the loudspeaker is rotating two physical effects can be observed: The first is a variation of the amplitude because sometimes the speaker is aimed at the observer and then, after 180 degrees of rotation, the loudspeaker is aimed opposing to the observer. To recreate this variation in amplitude, a circuit called Tremolo was designed to achieve this effect. The second is the Doppler effect, which was obtained with a circuit designed to vary the phase of the signal (Vibrato). The phase variation simulates a frequency variation for the ears. Assembling these two circuits in cascade, it is obtained the Pseudo Leslie Effect. These Vibrato and Tremolo circuits receive the control signal from a Low Frequency Oscillator (LFO) which controls the effect frequency. To get a high degree of repeatability, which is not simple in analog circuits employing photocouplers, those photocoupler devices were replaced with VCAs. The photocouplers have a great variation of your optical characteristics, so it is hard to obtain the same result in a large-scale production. However, using VCAs it turns to be easily achievable. The THAT2180 IC is a VCCS, Voltage-Controlled Current Source with an exponential gain control and low signal distortion. The term Pseudo was used because, in the Leslie Effect, the rotation of the loudspeaker gives a lag of 90o between the frequency and amplitude variations. This lag has not been implemented, but the sonic result left nothing to be desired.
Saturday May 24, 2025 9:40am - 10:00am CEST
C2 ATM Studio Warsaw, Poland

10:00am CEST

The Search for a Universal Microphone
Saturday May 24, 2025 10:00am - 10:20am CEST
Recording engineers and producers choose different microphones for different sound sources. It is intriguing that, in the 1950s and 1960s, the variety of available microphones was relatively limited compared to what we have available today. Yet, recordings from that era remain exemplary even now. The microphones used at the time were primarily vacuum tube models.
Through discussions at AES Conventions on improving phantom power supplies and my own experimentation with tube microphones myself, I began to realize that defining attribute of their sound might not stem solely from the tubes themselves. Instead, the type of power supply appeared to play a crucial role in shaping the final sound quality.
This hypothesis was confirmed with the introduction of high-voltage DPA 4003 and 4004 microphones, compared to their phantom-powered counterparts, the 4006 and 4007. In direct comparisons, the microphones with external, more current-efficient power supplies consistently delivered superior sound.
Having worked extensively with numerous AKG C12 and C24 microphones I identified two pairs, one of C12s and one of C24s with identical frequency characteristics. For one C12, we designed an entirely new, pure Class A transistor-based circuit with an external power supply.
Reflecting on my 50-plus years as a sound engineer and producer, I sought to determine which microphones were not only the best, but also the most versatile. My analysis led to four key solutions extending beyond the microphones themselves. Since I had already developed an ideal Class A equalizer, I applied the same technology to create four analog equalizers designed to fine-tune the prototype microphone’s frequency characteristics at the power supply level.
Speakers
Saturday May 24, 2025 10:00am - 10:20am CEST
C1 ATM Studio Warsaw, Poland

10:00am CEST

Computational Complexity Analysis of the K-Method for Nonlinear Circuit Modeling
Saturday May 24, 2025 10:00am - 10:20am CEST
In today's music industry and among musicians, instead of using analog hardware effects to alter sound, digital counterparts are increasingly being used, often in the form of software plugins. The circuits of musical devices often contain nonlinear components (diodes, vacuum tubes, etc.), which complicates their digital modeling. One of the approaches to address this is the use of state-space methods, such as the Euler or Runge-Kutta methods. To guarantee stability, implicit state-space methods should be used; however, they require the numerical solution of an implicit equation, leading to large computational complexity. Alternatively, the K-method can be used that avoids the need of numerical methods if the system meets certain conditions, thus significantly decreasing the computational complexity. Although the K-method has been invented almost three decades ago, the authors are not aware of a thorough computational complexity analysis of the method in comparison to the more common implicit state-space approaches, such as the backward Euler method. This paper introduces these two methods, explores their advantages, and compares their computational load as a function of model size by using a scalable circuit example.
Saturday May 24, 2025 10:00am - 10:20am CEST
C2 ATM Studio Warsaw, Poland

10:30am CEST

Student Recording Competition 4
Saturday May 24, 2025 10:30am - 11:30am CEST
Saturday May 24, 2025 10:30am - 11:30am CEST
C4 ATM Studio Warsaw, Poland

10:40am CEST

Immersive recordings in virtual acoustics: differences and similarities between a concert hall and its virtual counterpart
Saturday May 24, 2025 10:40am - 11:00am CEST
Virtual acoustic systems can artificially alter a recording studio's reverberation in real time using spatial room impulse responses captured in different spaces. By recreating another space's acoustic perception, these systems influence various aspects of a musician's performance. Traditional methods involve recording a dry performance and adding reverb in post-production, which may not align with the musician's artistic intent. In contrast, virtual acoustic systems allow simultaneous recording of both artificial reverb and the musician's interaction using standard recording techniques—just as it would occur in the actual space. This study analyzes immersive recordings of nearly identical musical performances captured in both real concert hall and McGill University's Immersive Media Lab (Imlab), which features a new dedicated virtual acoustics software, and highlights the similarities and differences between the performances recorded in the real space and its virtual counterpart.
Speakers
avatar for Gianluca Grazioli

Gianluca Grazioli

Montreal, Canada, McGill University
avatar for Richard King

Richard King

Professor, McGill University
Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →
Saturday May 24, 2025 10:40am - 11:00am CEST
C1 ATM Studio Warsaw, Poland
  Acoustics

10:40am CEST

A simplified RLS algorithm for adaptive Kautz filters
Saturday May 24, 2025 10:40am - 11:00am CEST
Modeling or compensating a given transfer function is a common task in the field of audio. To comply with the characteristics of hearing, logarithmic frequency resolution filters have been developed, including the Kautz filter, which has orthogonal tap outputs. When the system to be modeled is time-varying, the modeling filter should be tuned to follow the changes in the transfer function. The Least Mean Squares (LMS) and Recursive Least Squares (RLS) algorithms are well-known methods for adaptive filtering, where the latter has faster convergence rate with lower remaining error, at the expense of high computational demand. In this paper we propose a simplification to the RLS algorithm, which builds on the orthogonality of the tap outputs of Kautz filters, resulting in a significant reduction in computational complexity.
Saturday May 24, 2025 10:40am - 11:00am CEST
C2 ATM Studio Warsaw, Poland

10:45am CEST

Audio Post in the AI Future
Saturday May 24, 2025 10:45am - 12:15pm CEST
This panel discussion gathers professionals with a broad range of experience across audio post production for film, television and visual media. During the session, the panel will consider questions around how AI technology could be leveraged to solve common problems and pain-points across audio post, and offer opportunities to encourage human creativity, not supplant it.
Speakers
avatar for Bradford Swanson

Bradford Swanson

VP Partnerships, Nomono
Bradford is the VP of Partnerships at Nomono. Previously, he worked as a product manager at iZotope, and toured for 12 years as a musician, production manager, and FOH engineer. He has also served on the faculty at Tufts University, UMass Lowell, and Episcopal High School. He holds... Read More →
Saturday May 24, 2025 10:45am - 12:15pm CEST
C3 ATM Studio Warsaw, Poland

11:00am CEST

Analysis of the acoustic impulse response of an auditorium
Saturday May 24, 2025 11:00am - 11:20am CEST
The acoustic behaviour of an auditorium is analysed after measurements performed according to the ISO 3382:1 standard. The all-pole analysis of the measured impulse responses confirms the hypothesis that all responses have a common component that can be attributed to room characteristis. Results from a subsequent non-parametric analysis allows conjecturing that the overall reponse of the acoustic channel between two points may de decomposed in three components: one related to source position, another related to the room, and the last one depending on the position of the receiver.
Saturday May 24, 2025 11:00am - 11:20am CEST
C1 ATM Studio Warsaw, Poland
  Acoustics

11:00am CEST

An Artificial Reverberator Informed by Room Geometry and Visual Appearance
Saturday May 24, 2025 11:00am - 11:20am CEST
Without relying on audio data as a reference, artificial reverberation models often struggle to accurately simulate
the acoustics of real rooms. To address this, we propose a hybrid reverberator derived from a room’s physical
properties. Room geometry is extracted via Light Detection and Ranging mapping, enabling the calculation of
acoustic reflection paths via the Image Source Method. Frequency-dependent absorption is found by classifying
room surface materials with a multi-modal Large Language Model and referencing a database of absorption
coefficients. The extracted information is used to parametrise a hybrid reverberator, divided into two components:
early reflections, using a tapped delay line, and late reverberation, using a Scattering Feedback Delay Network.
Our listening test results show that participants often rate the proposed system as the most natural simulation of a
small hallway room. Additionally, we compare the reverberation metrics of the hybrid reverberator and similar
state-of-the-art models to those of the small hallway.
Speakers
avatar for Joshua Reiss

Joshua Reiss

Professor, Queen Mary University of London
Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →
Saturday May 24, 2025 11:00am - 11:20am CEST
C2 ATM Studio Warsaw, Poland

11:00am CEST

Loudness of movies for Broadcasting
Saturday May 24, 2025 11:00am - 12:00pm CEST
Broadcasting movies in linear TV or via streaming presents a considerable challenge, especially for highly dynamic content like action films. Normalising such content to the paradigm of "Programme Loudness" may result in dialogue levels much lower than the loudness reference level (-23 LUFS in Europe). On the other hand, normalising to the dialogue level may lead to overly loud sound effects. The EBU Loudness group PLOUD has addressed this issue with the publication of R 128 s4, the forth supplement to the core recommendation R 128. In order to have a better understanding of the challenge, an extensive analysis of 44 dubbed movies (mainly Hollywood mainstream films) has been conducted. These analysed films were already dynamically treated for broadcast delivery by experienced sound engineers. The background of the latest document of the PLOUD group will be presented and the main parameter LDR (Loudness-to-Dialogue-Ratio) will be introduced. A systematic approach when and how to proceed with dynamic treatment will be included.
Speakers
avatar for Florian Camerer

Florian Camerer

Senior Sound Engineer, ORF
Saturday May 24, 2025 11:00am - 12:00pm CEST
Hall F ATM Studio Warsaw, Poland

11:00am CEST

Students Project Expo
Saturday May 24, 2025 11:00am - 1:00pm CEST
Saturday May 24, 2025 11:00am - 1:00pm CEST
Hall F ATM Studio Warsaw, Poland

11:20am CEST

Sparsity-based analysis of sound field diffuseness in rooms
Saturday May 24, 2025 11:20am - 11:40am CEST
Sound fields in enclosures comprise a combination of directional and diffuse components. The directional components include the direct path from the source and the early specular reflections. The diffuse part starts with the first early reflection and builds up gradually over time. An ideal diffuse field is achieved when incoherent reflections begin to arrive randomly from all directions. More specifically, a diffuse field is characterized by having uniform energy density (i.e., independence from measurement position) and an isotropic distribution (i.e. random directions of incidence), which results in zero net energy flow (i.e. the net time-averaged intensity is zero). Despite this broad definition, real diffuse sound fields typically exhibit directional characteristics owing to the geometry and the non-uniform absorptive properties of rooms.

Several models and data-driven metrics based on the definition of a diffuse field have been proposed to assess diffuseness. A widely used metric is the _mixing time_, which indicates the transition of the sound field from directional to diffuse and is known to depend, among other factors, on the room geometry.

The concept of mixing time is closely linked to normalized echo density (NEDP), a measure first used to estimate the mixing time in actual rooms (Abel and Huang, 2006), and later to assess the quality of artificial reverberators in terms of their capacity to produce a dense reverberant tail (De Sena et al., 2015). NEDP is calculated over room impulse responses measured with a pressure probe, evaluating how much the RIR deviates from a normal distribution. Another similar temporal/statistical measure, kurtosis, has been used to similar effect (Jeong, 2016). However, neither NEDP nor kurtosis provides insights into the directional attributes of diffuse fields. While both approaches rely on statistical reasoning rather than identifying individual reflections, another temporal approach uses matching pursuit to identify individual reflections (Defrance et al., 2009).

Another set of approaches focuses on the net energy flow aspect of the diffuse field, providing an energetic analysis framework either in the time domain (Del Galdo et al., 2012) or in the time-frequency domain (Ahonen and Pulkki, 2009). These approaches rely on calculating the time-averaged active intensity, either using intensity probes or first- and higher-order Ambisonics microphones, where a pseudo-intensity-based diffuseness is computed (Götz et al., 2015). The coherence of spherical harmonic decompositions of the sound field has also been used to estimate diffuseness (Epain and Jin, 2016). Beamforming methods have likewise been applied to assess the directional properties of sound fields and to illustrate how real diffuse fields deviate from the ideal (Gover et al., 2004).

We propose a spatio-spectro-temporal (SST) sound field analysis approach based on a sparse plane-wave decomposition of sound fields captured using a higher-order Ambisonics microphone. The proposed approach has the advantage of analyzing the progression of the sound field’s diffuseness in both temporal and spatial dimensions. Several derivative metrics are introduced to assess temporal, spectro-temporal, and spatio-temporal characteristics of the diffuse field, including sparsity, diversity, and isotropy. We define the room sparsity profile (RSP), room sparsity relief (RSR), and room sparsity profile diversity (RSPD) as temporal, spectro-temporal, and spatio-temporal measures of diffuse fields, respectively. The relationship of this new approach to existing diffuseness measures is discussed and supported by experimental comparisons using 4th- and 6th-order acoustic impulse responses, demonstrating the dependence of the new derivative measures on measurement position. We conclude by considering the limitations and applicability of the proposed approach.
Saturday May 24, 2025 11:20am - 11:40am CEST
C1 ATM Studio Warsaw, Poland
  Acoustics

11:20am CEST

Direct convolution of high-speed 1 bit signal and finite impulse response
Saturday May 24, 2025 11:20am - 11:40pm CEST
Various AD conversion methods exist, and high-speed 1 bit method have been proposed with using a high sampling frequency and 1 bit quantization. The ΔΣ modulation is mainly used, and due to its characteristic, these signals are able to accurately preserve the spectrum of the analog signal and move quantization noise into higher frequency bands, which allows for a high signal-to-noise ratio in the audible range. However, When performing signal processing tasks such as addition and multiplication on high-speed 1 bit signals, it is generally necessary to convert them into multi-bit signals for arithmetic operations. In this paper, we propose a direct processing method for high-speed 1 bit signal without converting them into multi-bit signal and the convolution is realized. In this method, 1 bit data are reordered to achieve operations without arithmetic one. The proposed method was verified through the simulations with using low-pass FIR filters. Frequency-domain analysis showed that the proposed method achieved equivalent performance to conventional multi-bit convolutions with successfully performing the desired filtering. In this paper, we present a novel approach to directly processing high-speed 1 bit signals and suggest potential applications in audio and signal processing fields.
Speakers
Saturday May 24, 2025 11:20am - 11:40pm CEST
C2 ATM Studio Warsaw, Poland

11:40am CEST

Evaluating room acoustic parameters using ambisonic technology: a case study of a medium-sized recording studio
Saturday May 24, 2025 11:40am - 12:00pm CEST
Ambisonic technology has recently gained popularity in room acoustic measurements due to its ability to capture both general and spatial characteristics of a sound field using a single microphone. On the other hand, conventional measurement techniques conducted in accordance with the ISO 3382-1 standard require multiple transducers, which results in more time-consuming procedure. This study presents a case study on the use of ambisonic technology to evaluate the room acoustic parameters of a medium-sized recording studio.
Two ambisonic microphones, a first-order Sennheiser Ambeo and a third-order Zylia ZM1-3E, were used to record spatial impulse responses in 30 combinations of sound source and receiver positions in the recording studio. Key acoustic parameters, including Reverberation Time (T30), Early Decay Time (EDT) and Clarity (C80), were calculated using spatial decomposition methods. The Interaural Cross-Correlation Coefficient (IACC) was derived from binaural impulse responses obtained using the MagLS binauralization method. The results were compared with conventional omnidirectional and binaural microphone measurements to assess the accuracy and advantages of ambisonic technology. The findings show that T30, EDT, C50 and IACC values measured with the use of ambisonic microphones are consistent with those obtained from conventional measurements.
This study demonstrates the effectiveness of ambisonic technology in room acoustic measurements by capturing a comprehensive set of parameters with a single microphone. Additionally, it enables the estimation of reflection vectors, offering further insights into spatial acoustics.
Saturday May 24, 2025 11:40am - 12:00pm CEST
C1 ATM Studio Warsaw, Poland
  Acoustics

11:40am CEST

Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
Saturday May 24, 2025 11:40am - 12:00pm CEST
Speech denoising is a prominent and widely utilized task, appearing in many common use-cases. Although there are very powerful published machine learning methods, most of those are too complex for deployment in everyday and/or low resources computational environments, like hand-held devices, smart glasses, hearing aids, automotive platforms, etc. Knowledge distillation (KD) is a prominent way for alleviating this complexity mismatch, by transferring the learned knowledge from a pre-trained complex model, the teacher, to another less complex one, the student. KD is implemented by using minimization criteria (e.g. loss functions) between learned information of the teacher and the corresponding one from the student. Existing KD methods for speech denoising hamper the KD by bounding the learning of the student to the distribution learned by the teacher. Our work focuses on a method that tries to alleviate this issue, by exploiting properties of the cosine similarity used as the KD loss function. We use a publicly available dataset, a typical architecture for speech denoising (e.g. UNet) that is tuned for low resources environments and conduct repeated experiments with different architectural variations between the teacher and the student, reporting mean and standard deviation of metrics of our method and another, state-of-the-art method that is used as a baseline. Our results show that with our method we can make smaller speech denoising models, capable to be deployed into small devices/embedded systems, to perform better compared to when typically trained and when using other KD methods.
Saturday May 24, 2025 11:40am - 12:00pm CEST
C2 ATM Studio Warsaw, Poland

11:45am CEST

The Next Generation of Immersive Capture and Reproduction: Sessions from McGill University’s Virtual Acoustic Laboratory
Saturday May 24, 2025 11:45am - 12:45pm CEST
In this workshop, we present the next generation of Immersive audio capture and reproduction through virtual acoustics. The aural room, whether real or generated, brings together the listener and the sound source in a way that fulfills both the listener’s perceptual needs—like increasing the impression of orientation, presence, and envelopment—and creates aesthetic experiences by elaborating on the timbre and phrasing of the music.
Members of the Immersive Audio Lab (IMLAB) at McGill University will discuss recent forays in creating and capturing aural spaces, using technology ranging from virtual acoustics to Higher Order Ambisonics (HOA) microphones. Descriptions of capture methods, including microphone techniques and experiments will be accompanied by 7.1.4 audio playback demos.
From our studio sessions, we will showcase updates to our Virtual Acoustics Technology (VAT) system, which uses active acoustics in conjunction with 15 omnidirectional and 32 bidirectional speakers to transport musicians into simulated environments. Workshop elements will include a new methodology for creating dynamically changing interactive environments for musicians and listeners, ways to create focus and “mix” sound sources within the virtual room, experimental capture techniques for active acoustic environments, and real-time electronics spatialization in the tracking room via the VAT system.
On location, lab members have been experimenting with hybridized HOA capture systems for large-scale musical scenes. We will showcase multi-point HOA recording techniques to best capture direct sound and room reverberance, and excerpts that compare HOA to traditional channel-based capture systems.
Speakers
avatar for Kathleen Zhang

Kathleen Zhang

McGill University
AA

Aybar Aydin

PhD Candidate, McGill University
avatar for Michail Oikonomidis

Michail Oikonomidis

Doctoral student, McGill University
Michael Ikonomidis (Michail Oikonomidis) is an accomplished audio engineer and PhD student in Sound Recording at McGill University, specializing in immersive audio, high-channel count orchestral recordings and scoring sessions.With a diverse background in music production, live sound... Read More →
avatar for Richard King

Richard King

Professor, McGill University
Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →
Saturday May 24, 2025 11:45am - 12:45pm CEST
C4 ATM Studio Warsaw, Poland

12:15pm CEST

Workshop: How to Build a World-Class Brand in 24 Hours
Saturday May 24, 2025 12:15pm - 1:15pm CEST
In this dynamic, hackathon-style session, participants will rapidly develop a world-class brand strategy for their company using cutting-edge AI tools and collaborative exercises. Attendees will leave with an actionable blueprint they can implement immediately in their businesses or projects.

Format: 90 minute session
Key Takeaways:
Master the essentials of brand strategy and its impact on content creation and sales
Engage in hands-on exercises to develop a brand strategy in real time
Learn how AI tools can accelerate brand positioning
Speakers
Saturday May 24, 2025 12:15pm - 1:15pm CEST
C1/2 ATM Studio Warsaw, Poland

12:25pm CEST

Simulated Free-field Measurements
Saturday May 24, 2025 12:25pm - 1:45pm CEST
Time selective techniques that enable measurements of the free field response of a loudspeaker to be performed without the need for an anechoic chamber are presented. The low frequency resolution dependent room size limitations of both time selective measurements and anechoic chambers are discussed. Techniques combining signal processing and appropriate test methods are presented enabling measurements of the complex free field response of a loudspeaker to be performed throughout the entire audio frequency range without an anechoic chamber. Measurement technique for both nar field and time selective far field measurements are detailed. The results in both the time and frequency domain are available and ancilliary functions derived from these results are easily calculated automatically. A review of the current state of the art is also presented.
Saturday May 24, 2025 12:25pm - 1:45pm CEST
C2 ATM Studio Warsaw, Poland

12:30pm CEST

What was it about the Dolby Noise Reduction System that made it successful?
Saturday May 24, 2025 12:30pm - 1:30pm CEST
Warsaw tutorial

Love it or hate it the Dolby noise reduction system had a significant impact on sound recording practice. Even nowadays, in our digital audio workstation world, Dolby noise reduction units are used as effects processors. 2
However, when the system first came out in the 1960s, there were other noise reduction systems, but the Dolby “Model A” noise reduction system, and its successors, still became dominant. What was it about the Dolby system that made it so successful?
This tutorial will look in some detail into the inner workings of the Dolby A Noise reduction system to see how this came about.
Dolby made some key technical decisions in his design, that worked with the technology of the day, to provide noise reduction that did minimal harm to the audio signal and tried to minimise any audible effects of the noise reduction processing. We will examine these key decisions and show how the fitted with the technology and electronic components at the time.
The tutorial will start with a basic introduction to complementary noise reduction systems and their pros and cons. We will the go on to examine the Dolby system in more detail, including looking at some of the circuitry.
In particular, we will discuss:
1. The principle of least treatment.
2. Side chain processing.
3. Psychoacoustic elements.
4. What Dolby could have done better.
Although the talk will concentrate on the Model 301 processor, if time permits, we will look at the differences between it, and the later Cat 22 version.
The tutorial will be accessible to everyone, you will not have to be an electronic engineer to understand the principles behind this seminal piece of audio engineering history.
Speakers
avatar for Jamie Angus-Whiteoak

Jamie Angus-Whiteoak

Emeritus Professor/Consultant, University of Salford/JASA Consultancy
Jamie Angus-Whiteoak is Emeritus Professor of Audio Technology at Salford University. Her interest in audio was crystallized at age 11 when she visited the WOR studios in NYC on a school trip in 1967. After this she was hooked, and spent much of her free time studying audio, radio... Read More →
Saturday May 24, 2025 12:30pm - 1:30pm CEST
C3 ATM Studio Warsaw, Poland

1:30pm CEST

Key Technology Briefing 5
Saturday May 24, 2025 1:30pm - 2:45pm CEST
Saturday May 24, 2025 1:30pm - 2:45pm CEST
C1 ATM Studio Warsaw, Poland

1:45pm CEST

Be A Leader!
Saturday May 24, 2025 1:45pm - 2:24pm CEST
Have you ever wondered how AES works? Let's meet up and talk about the benefits of volunteering and the path to leadership in AES! You could be our next Chair, Vice President, or even AES President!
Speakers
avatar for Leslie Gaston-Bird

Leslie Gaston-Bird

President, Audio Engineering Society
Dr. Leslie Gaston-Bird (AMPS, MPSE) is President of the Audio Engineering Society and author of the books "Women in Audio", part of the AES Presents series and published by Focal Press (Routledge); and Math for Audio Majors (A-R Editions). She is a voting member of the Recording Academy... Read More →
Saturday May 24, 2025 1:45pm - 2:24pm CEST
Hall F ATM Studio Warsaw, Poland

1:45pm CEST

A century of dynamic loudspeakers
Saturday May 24, 2025 1:45pm - 2:24pm CEST
This tutorial is based on a Journal of the Audio Engineering Society review paper being submitted.

2025 marks the centennial of the commercial introduction of the modern dynamic direct radiating loudspeaker, Radiola 104, and the publication of Kellogg and Rice’s paper describing its design. The tutorial outlines the developments leading to the first dynamic loudspeakers and their subsequent evolution. The presentation focuses on direct radiating loudspeakers, although the parallel development of horn technology is acknowledged.

The roots of the dynamic loudspeaker trace back to the moving coil linear actuator patented by Werner Siemens in 1877. The first audio-related application was Sir Joseph Lodge’s 1896 mechanical telephone signal amplifier, or “repeater.” The first moving coil loudspeaker was the Magnavox by Peter Jensen in 1915, but the diaphragm assembly resembled earlier electromagnetic loudspeakers. The Blatthaller loudspeakers by Schottky and Gerlach in 1920’s are another example of a different early use of the dynamic concept.

It is interesting to take a look at the success factors of the dynamic loudspeakers, creating a market for quality sound reproduction and practically replacing the earlier electromagnetic designs by the end of 1920s. The first dynamic loudspeakers were heavy, expensive, and inefficient, but the sound quality could not be matched by any other technology available then. The direct radiating dynamic loudspeaker is also one of the most scalable technologies in engineering, both in terms of size and production volume. The dynamic loudspeaker is also quite friendly in terms of operating voltage and current, and what is important, the sound can be adjusted through enclosure design.

The breadth of the applications of dynamic loudspeakers would not have been possible without the developments in magnet materials. Early dynamic loudspeakers used electromagnets for air gap flux, requiring constant high power (e.g., Radiola 104’s field coil consumed 8W, while peak audio power was about 1W). Some manufacturers attempted steel permanent magnets, but these were bulky. A major breakthrough came with AlNiCo (Aluminum-Nickel-Cobalt) magnets, first developed in Japan in the 1930s and commercialized in the U.S. during World War II. AlNiCo enabled smaller, lighter, and more efficient designs. However, a cobalt supply crisis in 1970 led to the widespread adoption of ferrite (ceramic) magnets, which were heavier but cost-effective. The next advancement especially in small drivers were rare earth magnets introduced in the early 1980s. However, a neodymium supply crisis in the 2000s led to a partial return to ferrite magnets.

One of the focus points of the industry’s attention has been the cone and surround materials for the loudspeaker. Already the first units employed relatively lossy cardboard type material. Although plastic and foam materials were attempted in loudspeakers from 1950’s onwards, plastic cones for larger loudspeakers were successfully launched only in the late 1970’s. Metal cones, honeycomb diaphragms, and use of coatings to improve the stiffness have all brought more variety to the loudspeaker market, enabled by the significant improvement of numerical loudspeaker modelling and measurement methods, also starting their practical use during 1970’s.

A detail that was somewhat different in the first loudspeakers as compared to modern designs was the centering mechanism. The Radiola centering mechanism was complex, and soon simpler flat supports (giving the name “spider”) were developed. The modern concentrically corrugated centering system was developed in the early 1930’s by Walter Vollman at the German Gravor loudspeaker company, and this design has remained the standard solution with little variation.

The limitations of the high frequency reproduction of the early drivers led to improvements in driver design. The high frequency performance of the cone drivers was improved by introducing lossy or compliant areas that attempted to restrict the radiation of high frequencies to the apex part of the cone, and adding a double cone. The introduction of FM radio and improved records led to the need to develop loudspeakers with more extended treble reproduction. The first separate tweeter units were horn loudspeakers, and the first direct radiating tweeters were scaled down cone drivers, but late 1950’s saw the introduction of modern tweeters where the voice coil was outside the radiating diaphragm.

The latest paradigm shift in dynamic loudspeakers is the microspeaker, ubiquitous in portable devices. By manufacturing numbers, microspeakers are the largest class of dynamic loudspeakers, presenting unique structural, engineering, and manufacturing challenges. Their rapid evolution from the 1980s onwards includes the introduction of rare earth magnets, diaphragm forming improvements, and a departure from the cylindrical form factor of traditional loudspeakers. The next phase in loudspeaker miniaturization is emerging, with the first MEMS-based dynamic microspeakers now entering the market.
Speakers
JB

Juha Backman

AAC Technologies
Saturday May 24, 2025 1:45pm - 2:24pm CEST
C3 ATM Studio Warsaw, Poland
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
  • Acoustic Transducers & Measurements
  • Acoustics
  • Acoustics of large performance or rehearsal spaces
  • Acoustics of smaller rooms
  • Acoustics of smaller rooms Room acoustic solutions and materials
  • Acoustics & Sig. Processing
  • AI
  • AI & Machine Audition
  • Analysis and synthesis of sound
  • Archiving and restoration
  • Audio and music information retrieval
  • Audio Applications
  • Audio coding and compression
  • Audio effects
  • Audio Effects & Signal Processing
  • Audio for mobile and handheld devices
  • Audio for virtual/augmented reality environments
  • Audio formats
  • Audio in Education
  • Audio perception
  • Audio quality
  • Auditory display and sonification
  • Automotive Audio
  • Automotive Audio & Perception
  • Digital broadcasting
  • Electronic dance music
  • Electronic instrument design & applications
  • Evaluation of spatial audio
  • Forensic audio
  • Game Audio
  • Generative AI for speech and audio
  • Hearing Loss Protection and Enhancement
  • High resolution audio
  • Hip-Hop/R&B
  • Impact of room acoustics on immersive audio
  • Instrumentation and measurement
  • Interaction of transducers and the room
  • Interactive sound
  • Listening tests and evaluation
  • Live event and stage audio
  • Loudspeakers and headphones
  • Machine Audition
  • Microphones converters and amplifiers
  • Microphones converters and amplifiers Mixing remixing and mastering
  • Mixing remixing and mastering
  • Multichannel and spatial audio
  • Music and speech signal processing
  • Musical instrument design
  • Networked Internet and remote audio
  • New audio interfaces
  • Perception & Listening Tests
  • Protocols and data formats
  • Psychoacoustics
  • Room acoustics and perception
  • Sound design and reinforcement
  • Sound design/acoustic simulation of immersive audio environments
  • Spatial Audio
  • Spatial audio applications
  • Speech intelligibility
  • Studio recording techniques
  • Transducers & Measurements
  • Wireless and wearable audio