AES Europe 2025

AES Europe 2025

9:30am CEST

Correlation between middle and top layer loudspeaker signals and the listening range in 3D audio reproduction

Thursday May 22, 2025 9:30am - 9:50am CEST

C2

In auditory spatial perception, horizontal sound image localization and a sense of spaciousness are based on level and time differences between the left and right ears as cues, and the degree of correlation between the left and right signals is thought to contribute to the sense of horizontal spaciousness, in particular [Hidaka1995, Zotter2013]. For the vertical image spread (VIS), spectral cues are necessary. The change in VIS due to the degree of correlation between the vertical and horizontal signals depends on the frequency response [Gribben2018]. This paper investigated the influence of different correlation values between the top and middle layers of loudspeaker signals within a 3D audio reproduction system on listening impressions through two experiments. The results of experiments using pink noise with different correlation values for the top and middle layers show that the lower the vertical correlation values are, the wider the listening range is, where the impression does not change from the central listening position. From the results of experiments using impulse responses obtained by setting up microphones in an actual concert hall, a tendency to perceive a sense of spaciousness at the off-center listening position was found when cardioid microphones were used for the top layer that were spaced apart from the middle layer. The polar pattern and height of the microphones may have resulted in lower correlation values in the vertical direction, thus widening the listening range of consistent spatial impression outside of the central listening position (i.e., “sweet spot”.)

Speakers

Toru Kamekawa

Professor, Tokyo University of the Arts

Toru Kamekawa: After graduating from the Kyushu Institute of Design in 1983, he joined the Japan Broadcasting Corporation (NHK) as a sound engineer. During that period, he gained his experience as a recording engineer, mostly in surround sound programs for HDTV.In 2002, he joined... Read More →

Atsushi Marui

Thursday May 22, 2025 9:30am - 9:50am CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

9:50am CEST

Plane wave creation in non-spherical loudspeaker arrays using radius formulation by the Lamé function

Thursday May 22, 2025 9:50am - 10:10am CEST

C2

This paper proposes the method that plane wave field creation with spherical harmonics for a non-spherical array. In sound field control, there are physics-acoustic models and psycho-acoustic models. Some former are allowed in the location of each loudspeaker, but the sound have the differences between the auditory and the reproduction sound because phantom sources are constructed. The latter developed with wave equation under circle or spherical array conditions which are located strictly, and with high order Ambisonics (HOA) based on spherical harmonics which express only a single point. Therefore, we consider requiring the method which physically creates actual waveforms and provides flexibility in the shape of the loudspeaker array. In this paper, we focus on the Lamé function, changing its order as well as the shape of spatial figures, and propose formulating the distance between the center and each loudspeaker using the function in a polar expression. As the simulation experiment, in the inscribed region, the proposed plane wave can create the same waveform as the spherical one under high order Lamé function which is close to rectangular shape.

Speakers

Tomohiro Sakaguchi

Doctoral student, Waseda University

Yasuhiro Oikawa

Thursday May 22, 2025 9:50am - 10:10am CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

10:10am CEST

Recursive solution to the Broadband Acoustic Contrast Control with Pressure Matching algorithm

Thursday May 22, 2025 10:10am - 10:30am CEST

C2

This paper presents a recursive solution to the Broadband Acoustic Contrast Control with Pressure Matching (BACC-PM) algorithm, designed to optimize sound zones systems efficiently in the time domain. Traditional frequency-domain algorithms, while computationally less demanding, often result in non-causal filters with increased pre-ringing, making time-domain approaches preferable for certain applications. However, time-domain solutions typically suffer from high computational costs as a result of the inversion of large convolution matrices.
To address these challenges, this study introduces a method based on gradient descent and conjugate gradient descent techniques. By exploiting recursive calculations, the proposed approach significantly reduces computational time compared to direct inversion.
Theoretical foundations, simulation setups, and performance metrics are detailed, showcasing the efficiency of the algorithm in achieving high acoustic contrast and low reproduction errors with reduced computational effort. Simulations in a controlled environment demonstrate the advantages of the method.

Speakers

Guilhem Pages

Roberto Longo

Laurent Simon

Manuel Melon

Professor, LAUM / LE MANS Université

Thursday May 22, 2025 10:10am - 10:30am CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

10:30am CEST

GSound-SIR: A Spatial Impulse Response Ray-Tracing and High-order Ambisonic Auralization Python Toolkit

Thursday May 22, 2025 10:30am - 10:50am CEST

C2

Accurate and efficient simulation of room impulse responses is crucial for spatial audio applications. However, existing acoustic ray-tracing tools often operate as black boxes and only output impulse responses (IRs), providing limited access to intermediate data or spatial fidelity. To address those problems, this paper presents GSound-SIR, a novel Python-based toolkit for room acoustics simulation that addresses these limitations. The contribution of this paper includes the follows. First, GSound-SIR provides direct access to up to millions of raw ray data points from simulations, enabling in-depth analysis of sound propagation paths that was not possible with previous solutions. Second, we introduce a tool to convert acoustic rays into high-order Ambisonic impulse response synthesis, capturing spatial audio cues with greater fidelity than standard techniques. Third, to enhance efficiency, the toolkit implements an energy-based filtering algorithm and can export only the top-X or top-X-% rays. Fourth, we propose to store the simulation results into Parquet formats, facilitating fast data I/O and seamless integration with data analysis workflows. Together, these features make GSound-SIR an advanced, efficient, and modern foundation for room acoustics research, providing researchers and developers with a powerful new tool for spatial audio exploration.

Speakers

Yongyi Zang

Qiuqiang Kong

Thursday May 22, 2025 10:30am - 10:50am CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

11:00am CEST

Ambisonic Spatial Decomposition Method with salient / diffuse separation

Thursday May 22, 2025 11:00am - 11:20am CEST

C2

This paper proposes a new algorithm for enhancing the spatial resolution of measured first-order Ambisonics room impulse responses (FOA RIRs). It applies a separation of the RIR into a salient stream (direct sound and reflections) and a diffuse stream to treat them differently: The salient stream is enhanced using the Ambisonic Spatial Decomposition Method (ASDM) with a single direction of arrival (DOA) per sample of the RIR, while the diffuse stream is enhanced by 4-directional (4D-)ASDM with 4 DOAs at the same time. Listening experiments comparing the new Salient/Diffuse S/D-ASDM to ASDM, 4D-ASDM, and the original FOA RIR reveal the best results for the new algorithm in both spatial clarity and absence of artifacts, especially for its variant, which keeps the DOA constant within each salient event block.

Speakers

Lukas Gölles

University of Music and Performing Arts Graz - Institute of Electronic Music and Acoustics

Matthias Frank

Thursday May 22, 2025 11:00am - 11:20am CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

11:20am CEST

Towards a standard listener-independent HRTF to facilitate long-term adaptation

Thursday May 22, 2025 11:20am - 11:40am CEST

C2

Head-related transfer functions (HRTFs) are used in auditory applications for spatializing virtual sound sources. Listener-specific HRTFs, which aim at mimicking the filtering of the head, torso and pinnae of a specific listener, improve the perceived quality of virtual sound compared to using non-individualized HRTFs. However, using listener-specific HRTFs may not be accessible for everyone. Here, we propose as an alternative to take advantage of the adaptation abilities of human listeners to a new set of HRTFs. We claim that agreeing upon a single listener-independent set of HRTFs has beneficial effects for long-term adaptation compared to using several, potentially severely different HRTFs. Thus, the Non-individual Ear MOdel (NEMO) initiative is a first step towards a standardized listener-independent set of HRTFs to be used across applications as an alternative to individualization. A prototype, NEMObeta, is presented to explicitly encourage external feedback from the spatial audio community, and to agree on a complete list of requirements for the future HRTF selection.

Speakers

Pedro Llado

Katharina Pollack

PhD student in spatial audio, Acoustics Research Institute Vienna & Imperial College London

Katharina Pollack studied electrical engineering audio engineering in Graz, both at the Technical University and the University of Music and Performing Arts in Graz and is doing her PhD at the Acoustics Research Institute in Vienna in the field of spatial hearing. Her main research... Read More →

Nils Meyer-Kahlen

Aalto University

Thursday May 22, 2025 11:20am - 11:40am CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

11:40am CEST

Real-Time Auralization Pipeline for First-Person Vocal Interaction in Audio-Visual Virtual Environments

Thursday May 22, 2025 11:40am - 12:00pm CEST

C2

Multimodal research and applications are becoming more commonplace as Virtual Reality (VR) technology integrates different sensory feedback, enabling the recreation of real spaces in an audio-visual context. Within VR experiences, numerous applications rely on the user’s voice as a key element of interaction, including music performances and public speaking applications. Self-perception of our voice plays a crucial role in vocal production. When singing or speaking, our voice interacts with the acoustic properties of the environment, shaping the adjustment of vocal parameters in response to the perceived characteristics of the space.

This technical report presents a real-time auralization pipeline that leverages three-dimensional Spatial Impulse Responses (SIRs) for multimodal research applications in VR requiring first-person vocal interaction. It describes the impulse response creation and rendering workflow, the audio-visual integration, and addresses latency and computational considerations. The system enables users to explore acoustic spaces from various positions and orientations within a predefined area, supporting three and five Degrees of Freedom (3Dof and 5DoF) in audio-visual multimodal perception for both research and creative applications in VR.

The design of this pipeline arises from the limitations of existing audio tools and spatializers, particularly regarding signal latency, and the lack of SIRs captured from a first-person perspective and in multiple adjacent distributions to enable translational rendering. By addressing these gaps, the system enables real-time auralization of self-generated vocal feedback.

Speakers

Mauricio Flores Vargas

Enda Bates

Assistant Prof., Trinity College Dublin

I'm interested in spatial audio, spatial music, and psychoacoustics. I'm the deputy director of the Music & Media Technologies M.Phil. programme in Trinity College Dublin, and a researcher with the ADAPT centre. At this convention I'm presenting a paper on a Ambisonic Decoder Test... Read More →

Rachel McDonnell

Thursday May 22, 2025 11:40am - 12:00pm CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

12:00pm CEST

On the Design of Binaural Rendering Library for IAMF Immersive Audio Container

Thursday May 22, 2025 12:00pm - 12:20pm CEST

C2

Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications.

Speakers

Tomasz Rudzki

Gavin Kearney

Professor of Audio Engineering, University of York

Gavin Kearney graduated from Dublin Institute of Technology in 2002 with an Honors degree in Electronic Engineering and has since obtained MSc and PhD degrees in Audio Signal Processing from Trinity College Dublin. He joined the University of York as Lecturer in Sound Design in January... Read More →

Jan Skoglund

Google

Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →

Thursday May 22, 2025 12:00pm - 12:20pm CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

10:40am CEST

Acoustic Objects: bridging immersive audio creation and distribution systems

Friday May 23, 2025 10:40am - 11:00am CEST

C2

In recent years, professional and consumer audio and music technology has advanced in several areas, including sensory immersion, electronic transmission, content formats, and creation tools. The production and consumption of immersive media experiences increasingly rely on a global network of interconnected frameworks. These experiences, once confined to separate content markets like music, movies, video games, and virtual reality, are now becoming interoperable, ubiquitous, and adaptable to individual preferences, conditions, and languages. This article explores this evolution, focusing on flexible immersive audio creation and reproduction. We examine the development of object-based immersive audio technology and its role in unifying broadcast content with embodied experiences. We introduce the concept of Acoustic Objects, proposing a universal spatial audio scene representation model for creating and distributing versatile, navigable sound in music, multimedia, and virtual or extended reality applications.

Speakers

Jean-Marc Jot

Founder and Principal, Virtuel Works LLC

Spatial audio and music technology expert and innovator. Virtuel Works provides audio technology strategy, IP creation and licensing services to help accelerate the development of audio and music spatial computing technology and interoperability solutions.

Thibaut Carpentier

STMS Lab - IRCAM, SU, CNRS, Ministère de la Culture

Thibaut Carpentier studied acoustics at the École centrale and signal processing at Télécom ParisTech, before joining the CNRS as a research engineer. Since 2009, he has been a member of the Acoustic and Cognitive Spaces team in the STMS Lab (Sciences and Technologies of Music... Read More →

Olivier Warusfel

Friday May 23, 2025 10:40am - 11:00am CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

11:00am CEST

Immersive Music Production Workflows: An Ethnographic Study of Current Practices

Friday May 23, 2025 11:00am - 11:20am CEST

C2

This study presents an ethnographic analysis of current immersive music production workflows, examining industry trends, tools, and methodologies. Through interviews and participant observations with professionals across various sectors, the research identifies common patterns, effective strategies, and persistent obstacles in immersive audio production. Key findings highlight the ongoing struggle for standardized workflows, the financial and technological barriers faced by independent artists, and the critical role of collaboration between engineers and creatives. Despite the growing adoption of immersive formats, workflows still follow stereo conventions, treating spatialization as an afterthought and complicating the translation of mixes across playback systems. Additionally, the study explores the evolving influence of object-based and bed-based mixing techniques, monitoring inconsistencies across playback systems, and the need for improved accessibility to immersive production education. By synthesizing qualitative insights, this paper contributes to the broader discourse on immersive music production, offering recommendations for future research and industry-wide best practices to ensure the sustainable integration of spatial audio technologies.

Speakers

Marcela Rada

Audio Engineer

Marcela is a talented and accomplished audio engineer that has experience both in the studio and in the classroom teaching university level students the skills of becoming professional audio engineers and music producers. She has worked across music genres recording, editing, mixing... Read More →

Russell Mason

Institute of Sound Recording, University of Surrey

Enzo De Sena

Senior Lecturer, University of Surrey

Enzo De Sena is a Senior Lecturer at the Institute of Sound Recording at the University of Surrey. He received the M.Sc. degree (cum laude) in Telecommunication engineering from the Università degli Studi di Napoli “Federico II,” Italy, in 2009 and the PhD degree in Electronic Engineering from King’s College London, UK, in 2013. Between 2013 and 2016 he was a postdoctoral researcher at KU Leuven... Read More →

Friday May 23, 2025 11:00am - 11:20am CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

11:20am CEST

Spherical harmonic beamforming based Ambisonics encoding and upscaling method for smartphone microphone array

Friday May 23, 2025 11:20am - 11:40am CEST

C2

With the rapid development of virtual reality (VR) and augmented reality (AR), spatial audio recording and reproduction have gained increasing research interest. Higher Order Ambisonics (HOA) stands out for its adaptability to various playback devices and its ability to integrate head orientation. However, current HOA recordings often rely on bulky spherical microphone arrays (SMA), and portable devices like smartphones are limited by array configuration and number of microphones. We propose a method for HOA encoding using a smartphone microphone array (SPMA). By designing beamformers for each order of spherical harmonic functions based on the array manifold, the method enables HOA encoding and up-scaling. Validation on a real SPMA and its simulated free-field counterpart in noisy and reverberant conditions showed that the method successfully encodes and up-scales HOA up to the fourth order with just four irregularly arranged microphones.

Speakers

Yuhuan You

Yufan Qian

Tianshu Qu

Peking University

Bin Wang

Xueyang Lv

Friday May 23, 2025 11:20am - 11:40am CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

1:30pm CEST

On the effect of photogrammetric reconstruction and pinna deformation methods on individual head-related transfer functions

Friday May 23, 2025 1:30pm - 1:50pm CEST

C2

Individual head-related transfer functions (HRTFs) are instrumental in rendering plausible spatial audio playback over headphones as well as in understanding auditory perception. Nowadays, the numerical calculation of individual HRTFs is achievable even without high-performance computers. However, the main obstacle is the acquisition of a mesh of the pinnae with a submillimeter accuracy. One approach to this problem is the photogrammetric reconstruction (PR), which estimates a 3D shape from 2D input, e.g., photos. Albeit easy to use, this approach comes with a trade-off in the resulting mesh quality, which subsequently has a substantial impact on the HRTF's quality. In this study, we investigated the effect of PR on HRTF quality as compared to HRTFs calculated from a reference mesh acquired with a high-quality structured-light scanner. Additionally, we applied two pinna deformation methods, which registered a non-individual high-quality pinna to the individual low-quality PR pinna by means of geometric distances. We investigated the potential of these methods to improve the quality of the PR-based pinna meshes. Our evaluation involved the geometrical, acoustical, and psychoacoustical domains including a sound-localization experiment with 9 participants. Our results show that neither PR nor PR-improvement methods were able to provide individual HRTFs of sufficient quality, indicating that without extensive pre- or post-processing, PR provides too little individual detail in the HRTF-relevant pinna regions.

Speakers

Katharina Pollack

PhD student in spatial audio, Acoustics Research Institute Vienna & Imperial College London

Katharina Pollack studied electrical engineering audio engineering in Graz, both at the Technical University and the University of Music and Performing Arts in Graz and is doing her PhD at the Acoustics Research Institute in Vienna in the field of spatial hearing. Her main research... Read More →

Piotr Majdak

Austrian Academy of Sciences

Lorenzo Picinali

Friday May 23, 2025 1:30pm - 1:50pm CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

1:50pm CEST

Mesh2PPM - Automatic Parametrization of the BezierPPM: Entire Pinna

Friday May 23, 2025 1:50pm - 2:10pm CEST

C2

An individual human pinna geometry can be used to achieve plausible personalized audio reproduction. However, an accurate acquisition of the pinna geometry typically requires the use of specialized equipment and often involves time-consuming post-processing to remove potential artifacts. To obtain an artifact-free but individualized mesh, a parametric pinna model based on cubic Bézier curves (BezierPPM) can be used to represent an individual pinna. However, the parameters need to be manually tuned to the acquired listener’s geometry. For increased scalability, we propose Mesh2PPM, a framework for an automatic estimation of BezierPPM parameters from an individual pinna. Mesh2PPM relies on a deep neural network (DNN) that was trained on a dataset of synthetic multi-view images rendered from BezierPPM instances. For the evaluation, unseen BezierPPM instances were presented to Mesh2PPM which inferred the BezierPPM parameters. We subsequently assessed the geometric errors between the meshes obtained from the BezierPPM parametrized with the inferred parameters and the actual pinna meshes. We investigated the effects of the camera-grid type, jittered camera positions, and additional depth information in images on the estimation quality. While depth information had no effect, the camera-grid type and the jittered camera positions both had effects. A camera grid of 3×3 provided the best estimation quality, yielding Pompeiu-Hausdorff distances of 2.05 ± 0.4 mm and 1.4 ± 0.3 mm with and without jittered camera
positions, respectively, and root-mean-square (RMS) distances of 0.92 ± 0.12 mm and 0.52 ± 0.07 mm. These results motivate further improvements of the proposed framework to be ultimately applicable for an automatic estimation of pinna geometries obtained from actual listeners.

Speakers

Florian Pausch

Felix Perfler

Nicki Holighaus

Piotr Majdak

Austrian Academy of Sciences

Friday May 23, 2025 1:50pm - 2:10pm CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

2:10pm CEST

Towards a Headphone Target Curve for Spatial Audio

Friday May 23, 2025 2:10pm - 2:30pm CEST

C2

In order to reproduce audio over headphones as in-
tended, it is essential to have well-defined and con-
sistent references of how headphones should sound.
With the aim of stereo reproduction in mind, the field
has established a de-facto reference target curve called
the Harman Target Curve to which headphone transfer
functions are commonly compared. This contribution
questions if the same target curve is suitable when used
for the reproduction of spatial audio. First, the ori-
gins the Harman Curve are revisited; it is motivated by
the frequency response of loudspeaker playback in a
specific listening room. The necessary measurement
procedures are described in detail. Then, the paper
discusses the applicability of existing targets to spa-
tial audio. Therein, it is possible to embed convincing
spatial room information directly into the production,
thereby calling into question the motivation for incor-
porating a listening room in the headphone target. The
paper concludes with a listening experiment that com-
pares the preference of different target curves for both
spatial audio and stereo

Speakers

Alexander Mülleder

Graz University of Technology

Nils Meyer-Kahlen

Aalto University

Konstantin Davy

Matthias Frank

Friday May 23, 2025 2:10pm - 2:30pm CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

2:30pm CEST

Sound Source Directivity Estimation in Spherical Fourier Domain from Sparse Measurements

Friday May 23, 2025 2:30pm - 2:50pm CEST

C2

In recent years, applications such as virtual reality (VR) systems and room acoustics simulations have brought the modeling of sound source directivity into focus. An accurate simulation of directional responses of sound sources is essential in immersive audio applications.

Real sound sources have directional properties that are different from simple sources such as monopoles, which are sources frequently used for modeling more complex acoustic fields. For instance, the sound level of human speech as a sound source varies considerably depending on where the sound is recorded with respect to the talker’s head. The same is true for loudspeakers, which are considered linear and time-independent sources. When the sound is recorded behind the speaker, it is normal to observe differences of up to 20 dB SPL at some frequencies. The directional characteristics of sound sources become particularly pronounced at high frequencies. The propagation of real sound sources, such as human voices or musical instruments, differs from simple source models like monopoles, dipoles, and quadrupoles due to their physical structures.

The common approach to measuring directivity patterns of sound sources involves surrounding a sound source in an anechoic chamber with a high number of pressure microphones on a spherical grid and registering the sound power at these positions. Apart from the prohibitive hardware requirements, such measurement setups are mostly impractical and costly. Audio system manufacturers have developed various methods for measuring sound source directionality over the years. These methods are generally of high technical complexity.

This article proposes a new, reduced-complexity directivity measurement approach based on the spherical harmonic decomposition of the sound field. The method estimates the directional characteristics of sound sources using fewer measurement points with spherical microphone arrays. The spherical harmonic transform allows for the calculation of directivity using data collected from spherical microphone arrays instead of pressure sensors. The proposed method uses both the pressure component and spatial derivatives of the sound field and successfully determines directivity with sparse measurements.

An estimation model based on the spherical Fourier transform was developed, measurements were carried out to test this model, and preliminary results obtained from the estimation model are presented. Experiments conducted at the METU Spatial Audio Research Laboratory demonstrated the effectiveness of the proposed method. The directivity characteristics of Genelec 6010A loudspeaker are measured using eight 3rd-order spherical microphone arrays. The directivity functions obtained were highly consistent with the data provided by the loudspeaker manufacturer. The results, especially in low and mid-frequency bands, indicate the utility of the proposed method.

Speakers

Orhun Olgun

Huseyin Hacihabiboglu

Friday May 23, 2025 2:30pm - 2:50pm CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

2:50pm CEST

Perceptual evaluation of professional point and line sources for immersive audio applications

Friday May 23, 2025 2:50pm - 3:10pm CEST

C2

Immersive sound reinforcement aims to create a balanced perception of sounds arriving from different directions, establishing an impression of envelopment over the audience area. Current perceptual research shows that coverage designs featuring nearly constant decay (0dB per distance doubling) preserve the level balance among audio objects in the mix. In contrast, a -3dB decay supports a more uniform sensation of envelopment, especially for off-center listening positions. For practical reasons, point-source loudspeakers remain widely used for immersive audio playback in mid-sized venues. However, point-source loudspeakers inherently decay by -6dB per distance doubling, and using them can conflict with the design goals outlined above. In this paper, we investigate the perceived differences between point-source and line-source setups using eight surrounding loudspeakers side-by-side covering a 10m x 7m audience area. The perceptual qualities of object level balance, spatial definition, and envelopment were compared in a MUSHRA listening experiment, and acoustic measurements were carried out to capture room impulse responses and binaural room impulse responses (BRIRs) of the experimental setup. The BRIRs were used to check whether the results of the listening experiment were reproducible on headphones. Both the loudspeaker and headphone-based experiments delivered highly correlated results. Also, regression models devised based on the acoustic measurements are highly correlated to the perceptual results. The results confirm that elevated line sources, exhibiting a practically realizable decay of -2dB per distance doubling, help preserve object-level balance, increase spatial definition, and provide a uniform envelopment experience throughout the audience area compared to point-source loudspeakers.

Speakers

Leon Merkel

Franz Zotter

University of Music and Performing Arts Graz

Franz Zotter received an M.Sc. degree in electrical and audio engineering from the University of Technology (TUG) in 2004, a Ph.D. degree in 2009 and a venia docendi in 2023 from the University of Music and Performing Arts (KUG) in Graz, Austria. He joined the Institute of Electronic... Read More →

Philip Coleman

Senior Immersive Audio Research Engineer, L-Acoustics

I'm a research engineer in the L-ISA immersive audio team at L-Acoustics, based in Highgate, London. I'm working on the next generation of active acoustics and object-based spatial audio reproduction, to deliver the best possible shared experiences.Before joining L-Acoustics in September... Read More →

Friday May 23, 2025 2:50pm - 3:10pm CEST
C2 ATM Studio Warsaw, Poland

Presentation Type Paper Presentation

4:00pm CEST

Binamix - A Python Library for Generating Binaural Audio Datasets

Friday May 23, 2025 4:00pm - 6:00pm CEST

Hall F

The increasing demand for spatial audio in applications such as virtual reality, immersive media, and spatial audio research necessitates robust solutions for binaural audio dataset generation for testing and validation. Binamix is an open-source Python library designed to facilitate programmatic binaural mixing using the extensive SADIE II Database, which provides HRIR and BRIR data for 20 subjects. The Binamix library provides a flexible and repeatable framework for creating large-scale spatial audio datasets, making it an invaluable resource for codec evaluation, audio quality metric development, and machine learning model training. A range of pre-built example scripts, utility functions, and visualization plots further streamline the process of custom pipeline creation. This paper presents an overview of the library's capabilities, including binaural rendering, impulse response interpolation, and multi-track mixing for various speaker layouts. The tools utilize a modified Delaunay triangulation technique to achieve accurate HRIR/BRIR interpolation where desired angles are not present in the data. By supporting a wide range of parameters such as azimuth, elevation, subject IRs, speaker layouts, mixing controls, and more, the library enables researchers to create large binaural datasets for any downstream purpose. Binamix empowers researchers and developers to advance spatial audio applications with reproducible methodologies by offering an open-source solution for binaural rendering and dataset generation.

Speakers

Dan Barry

Davoud Shariat Panah

Alessandro Ragano

Jan Skoglund

Google

Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →

Andrew Hines

Friday May 23, 2025 4:00pm - 6:00pm CEST
Hall F ATM Studio Warsaw, Poland

Presentation Type Poster Presentation

4:00pm CEST

Neural 3D Audio Renderer for acoustic digital twin creation

Friday May 23, 2025 4:00pm - 6:00pm CEST

Hall F

In this work, we introduce a Neural 3D Audio Renderer (N3DAR) - a conceptual solution for creating acoustic digital twins of arbitrary spaces. We propose a workflow that consists of several stages including:
1. Simulation of high-fidelity Spatial Room Impulse Responses (SRIR) based on the 3D model of a digitalized space,
2. Building an ML-based model of this space for interpolation and reconstruction of SRIRs,
3. Development of a real-time 3D audio renderer that allows the deployment of the digital twin of a space with accurate spatial audio effects consistent with the actual acoustic properties of this space.
The first stage consists of preparation of the 3D model and running the SRIR simulations using the state-of-the-art wave-based method for arbitrary pairs of source-receiver positions. This stage provides a set of learning data being used in the second stage - training the SRIR reconstruction model. The training stage aims to learn the model of the acoustic properties of the digitalized space using the Acoustic Volume Rendering approach (AVR). The last stage is the construction of a plugin with a dedicated 3D audio renderer where rendering comprises reconstruction of the early part of the SRIR, estimation of the reverb part, and HOA-based binauralization.
N3DAR allows the building of tailored audio rendering plugins that can be deployed along with visual 3D models of digitalized spaces, where users can freely navigate through the space with 6 degrees of freedom and experience high-fidelity binaural playback in real time.
We provide a detailed description of the challenges and considerations for each of the stages. We also conduct an extensive evaluation of the audio rendering capabilities with both, objective metrics and subjective methods using a dedicated evaluation platform.

Speakers

Lukasz Januszkiewicz

Piotr Cenda

Maria Pensko

Jakub Wasilewski

SoftServe Poland Sp. z o.o.

Tomasz Wozniak

Friday May 23, 2025 4:00pm - 6:00pm CEST
Hall F ATM Studio Warsaw, Poland

Presentation Type Poster Presentation

4:00pm CEST

Performance Estimation Method for 3D Microphone Array based on the Modified Steering Vector in Spherical Harmonic Domain

Friday May 23, 2025 4:00pm - 6:00pm CEST

Hall F

This paper presents an objective method for estimating the performance of 3D microphone arrays, which is also applicable to 2D arrays. The method incorporates the physical characteristics and relative positions of the microphones, merging these elements through a weighted summation to derive the arrays' directional patterns. These patterns are represented as a "Modified Steering Vector." Additionally, leveraging the spatial properties of spherical harmonics, we transform the array's directional pattern into the spherical harmonic domain. This transformation enables a quantitative analysis of the physical properties of each component, providing a comprehensive understanding of the array's performance. Overall, the proposed method offers a deeply insightful and versatile framework for evaluating the performance of both 2D and 3D microphone arrays by fully exploiting their inherent physical characteristics.

Speakers

Chuhan Qiu

Lingyun Xie

Friday May 23, 2025 4:00pm - 6:00pm CEST
Hall F ATM Studio Warsaw, Poland

Presentation Type Poster Presentation

4:00pm CEST

Reconstructing Sound Fields with Physics-Informed Neural Networks: Applications in Real-World Acoustic Environments

Friday May 23, 2025 4:00pm - 6:00pm CEST

Hall F

The reconstruction of sound fields is a critical component in a range of applications, including spatial audio for augmented, virtual, and mixed reality (AR/VR/XR) environments, as well as for optimizing acoustics in physical spaces. Traditional approaches to sound field reconstruction predominantly rely on interpolation techniques, which estimate sound fields based on a limited number of spatial and temporal measurements. However, these methods often struggle with issues of accuracy and realism, particularly in complex and dynamic environments. Recent advancements in deep learning have provided promising alternatives, particularly with the introduction of Physics-Informed Neural Networks (PINNs), which integrate physical laws directly into the model training process. This study aims to explore the application of PINNs for sound field reconstruction, focusing on the challenge of predicting acoustic fields in unmeasured areas. The experimental setup involved the collection of impulse response data from the Promenadikeskus concert hall in Pori, Finland, using various source and receiver positions. The PINN framework is then utilized to simulate the hall’s acoustic behavior, with parameters incorporated to model sound propagation across different frequencies and source-receiver configurations. Despite challenges arising from computational load, pre-processing strategies were implemented to optimize the model's efficiency. The results demonstrate that PINNs can accurately reconstruct sound fields in complex acoustic environments, offering significant potential for real-time sound field control and immersive audio applications.

Speakers

Rigas Kotsakis

Aristotle University of Thessaloniki

Sotiris Lois

Iordanis Thoidis

Aristotle University of Thessaloniki

Nikolaos Vryzas

Aristotle University Thessaloniki

Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →

Lazaros Vrysis

Aristotle University of Thessaloniki

George Kalliris

Friday May 23, 2025 4:00pm - 6:00pm CEST
Hall F ATM Studio Warsaw, Poland

Presentation Type Poster Presentation

4:00pm CEST

Recording and post-production of Dietrich Buxtehude baroque cantatas in stereo and Dolby Atmos using experimental 3D microphone array.

Friday May 23, 2025 4:00pm - 6:00pm CEST

Hall F

3D recordings seem to be an attractive solution when trying to achieve the immersion effect. Recently, Dolby Atmos is an increasingly popular format for distributing three-dimensional music recordings. Although currently the main format for producing music recordings is still stereophony.

How to optimally extend traditional microphone techniques when recording classical music to obtain both stereo recordings and three-dimensional formats (e.g. Dolby Atmos) in the post-production process? The author is trying to answer this question using the example of a recording of Dietrich Buxtehude work "Membra Jesu Nostri", BuxWV 75. The cycle of seven cantatas composed in 1680 is one of the most important and most popular compositions of the early Baroque era. The first Polish recording was made by the Arte Dei Suonatori conducted by Bartłomiej Stankowiak, accompanied by soloists and choral parts performed by the choir Cantus Humanus.

The author will present his concept of a set of microphones for 3D recordings. In addition to the detailed setup of microphones, it will cover the method of post-production of the recording, combining stereo with a mix of the recording into the Dolby Atmos system in a 7.2.4 speaker configuration. A workflow will be proposed to facilitate the change between different formats.

Speakers

Łukasz Kurzawski

Friday May 23, 2025 4:00pm - 6:00pm CEST
Hall F ATM Studio Warsaw, Poland

Presentation Type Poster Presentation

4:00pm CEST

Subjective Evaluation on Three-dimensional VBAP and Ambisonics in an Immersive Concert Setting

Friday May 23, 2025 4:00pm - 6:00pm CEST

Hall F

This paper investigates the subjective evaluation of two prominent three-dimensional spatialization techniques—Vector Base Amplitude Panning (VBAP) and High-Order Ambisonics (HOA)—using IRCAM’s Spat in an immersive concert setting. The listening test was conducted in the New Hall at the Royal Danish Academy of Music, which features a 44-speaker immersive audio system. The musical stimuli included electronic compositions and modern orchestral recordings, providing a diverse range of temporal and spectral content. The participants comprised experienced Tonmeisters and non-experienced musicians, who were seated in off-center positions to simulate real-world audience conditions. This study provides an ecologically valid subjective evaluation methodology.
The results indicated that VBAP excelled in spatial clarity and sound quality, while HOA demonstrated superior envelopment. The perceptual differences between the two techniques were relatively minor, influenced by room acoustics and suboptimal listening positions. Furthermore, music genre had no significant impact on the evaluation outcomes.
The study highlights VBAP’s strength in precise localization and HOA's capability for creating immersive soundscapes, aiming to bridge the gap between ideal and real-world applications in immersive sound reproduction and perception. The findings suggest the need to balance trade-offs when selecting spatialization techniques for specific purposes, venues, and audience positions. Future research will focus on evaluating a wider range of spatialization methods in concert environments and optimizing them to improve the auditory experience for distributed audiences.

Speakers

Chenyu Xu

Giacomo Ascari

Kate Bosen

Jesper Andersen

Head of Tonmeister Programme, Det Kgl Danske Musikkonservatorium

As a Grammy-nominated producer, engineer and pianist Jesper has recorded around 100 CDs and produced music for radio, TV, theatre, installations and performance. Jesper has also worked as a sound engineer/producer at the Danish Broadcasting Corporation.A recent album-production is... Read More →

Stefania Serafin

Professor, Aalborg University Copenhagen

I am Professor in Sonic interaction design at Aalborg University in Copenhagen and leader of the Multisensory Experience Labtogether with Rolf Nordahl.I am the President of the Sound and Music Computing association, Project Leader of the Nordic Sound and Music Computing netwo... Read More →

Friday May 23, 2025 4:00pm - 6:00pm CEST
Hall F ATM Studio Warsaw, Poland

Presentation Type Poster Presentation

4:00pm CEST

Visualization of the spatial behavior between channels in surround program

Friday May 23, 2025 4:00pm - 6:00pm CEST

Hall F

#N/A

Speakers

Pavel Smokotnin

Friday May 23, 2025 4:00pm - 6:00pm CEST
Hall F ATM Studio Warsaw, Poland

Presentation Type Poster Presentation