Loading…
Venue: C2 clear filter
Thursday, May 22
 

9:30am CEST

Correlation between middle and top layer loudspeaker signals and the listening range in 3D audio reproduction
Thursday May 22, 2025 9:30am - 9:50am CEST
In auditory spatial perception, horizontal sound image localization and a sense of spaciousness are based on level and time differences between the left and right ears as cues, and the degree of correlation between the left and right signals is thought to contribute to the sense of horizontal spaciousness, in particular [Hidaka1995, Zotter2013]. For the vertical image spread (VIS), spectral cues are necessary. The change in VIS due to the degree of correlation between the vertical and horizontal signals depends on the frequency response [Gribben2018]. This paper investigated the influence of different correlation values between the top and middle layers of loudspeaker signals within a 3D audio reproduction system on listening impressions through two experiments. The results of experiments using pink noise with different correlation values for the top and middle layers show that the lower the vertical correlation values are, the wider the listening range is, where the impression does not change from the central listening position. From the results of experiments using impulse responses obtained by setting up microphones in an actual concert hall, a tendency to perceive a sense of spaciousness at the off-center listening position was found when cardioid microphones were used for the top layer that were spaced apart from the middle layer. The polar pattern and height of the microphones may have resulted in lower correlation values in the vertical direction, thus widening the listening range of consistent spatial impression outside of the central listening position (i.e., “sweet spot”.)
Speakers
avatar for Toru Kamekawa

Toru Kamekawa

Professor, Tokyo University of the Arts
Toru Kamekawa: After graduating from the Kyushu Institute of Design in 1983, he joined the Japan Broadcasting Corporation (NHK) as a sound engineer. During that period, he gained his experience as a recording engineer, mostly in surround sound programs for HDTV.In 2002, he joined... Read More →
Thursday May 22, 2025 9:30am - 9:50am CEST
C2 ATM Studio Warsaw, Poland

9:50am CEST

Plane wave creation in non-spherical loudspeaker arrays using radius formulation by the Lamé function
Thursday May 22, 2025 9:50am - 10:10am CEST
This paper proposes the method that plane wave field creation with spherical harmonics for a non-spherical array. In sound field control, there are physics-acoustic models and psycho-acoustic models. Some former are allowed in the location of each loudspeaker, but the sound have the differences between the auditory and the reproduction sound because phantom sources are constructed. The latter developed with wave equation under circle or spherical array conditions which are located strictly, and with high order Ambisonics (HOA) based on spherical harmonics which express only a single point. Therefore, we consider requiring the method which physically creates actual waveforms and provides flexibility in the shape of the loudspeaker array. In this paper, we focus on the Lamé function, changing its order as well as the shape of spatial figures, and propose formulating the distance between the center and each loudspeaker using the function in a polar expression. As the simulation experiment, in the inscribed region, the proposed plane wave can create the same waveform as the spherical one under high order Lamé function which is close to rectangular shape.
Speakers
TS

Tomohiro Sakaguchi

Doctoral student, Waseda University
Thursday May 22, 2025 9:50am - 10:10am CEST
C2 ATM Studio Warsaw, Poland

10:10am CEST

Recursive solution to the Broadband Acoustic Contrast Control with Pressure Matching algorithm
Thursday May 22, 2025 10:10am - 10:30am CEST
This paper presents a recursive solution to the Broadband Acoustic Contrast Control with Pressure Matching (BACC-PM) algorithm, designed to optimize sound zones systems efficiently in the time domain. Traditional frequency-domain algorithms, while computationally less demanding, often result in non-causal filters with increased pre-ringing, making time-domain approaches preferable for certain applications. However, time-domain solutions typically suffer from high computational costs as a result of the inversion of large convolution matrices.
To address these challenges, this study introduces a method based on gradient descent and conjugate gradient descent techniques. By exploiting recursive calculations, the proposed approach significantly reduces computational time compared to direct inversion.
Theoretical foundations, simulation setups, and performance metrics are detailed, showcasing the efficiency of the algorithm in achieving high acoustic contrast and low reproduction errors with reduced computational effort. Simulations in a controlled environment demonstrate the advantages of the method.
Speakers
avatar for Manuel Melon

Manuel Melon

Professor, LAUM / LE MANS Université
Thursday May 22, 2025 10:10am - 10:30am CEST
C2 ATM Studio Warsaw, Poland

10:30am CEST

GSound-SIR: A Spatial Impulse Response Ray-Tracing and High-order Ambisonic Auralization Python Toolkit
Thursday May 22, 2025 10:30am - 10:50am CEST
Accurate and efficient simulation of room impulse responses is crucial for spatial audio applications. However, existing acoustic ray-tracing tools often operate as black boxes and only output impulse responses (IRs), providing limited access to intermediate data or spatial fidelity. To address those problems, this paper presents GSound-SIR, a novel Python-based toolkit for room acoustics simulation that addresses these limitations. The contribution of this paper includes the follows. First, GSound-SIR provides direct access to up to millions of raw ray data points from simulations, enabling in-depth analysis of sound propagation paths that was not possible with previous solutions. Second, we introduce a tool to convert acoustic rays into high-order Ambisonic impulse response synthesis, capturing spatial audio cues with greater fidelity than standard techniques. Third, to enhance efficiency, the toolkit implements an energy-based filtering algorithm and can export only the top-X or top-X-% rays. Fourth, we propose to store the simulation results into Parquet formats, facilitating fast data I/O and seamless integration with data analysis workflows. Together, these features make GSound-SIR an advanced, efficient, and modern foundation for room acoustics research, providing researchers and developers with a powerful new tool for spatial audio exploration.
Thursday May 22, 2025 10:30am - 10:50am CEST
C2 ATM Studio Warsaw, Poland

11:00am CEST

Ambisonic Spatial Decomposition Method with salient / diffuse separation
Thursday May 22, 2025 11:00am - 11:20am CEST
This paper proposes a new algorithm for enhancing the spatial resolution of measured first-order Ambisonics room impulse responses (FOA RIRs). It applies a separation of the RIR into a salient stream (direct sound and reflections) and a diffuse stream to treat them differently: The salient stream is enhanced using the Ambisonic Spatial Decomposition Method (ASDM) with a single direction of arrival (DOA) per sample of the RIR, while the diffuse stream is enhanced by 4-directional (4D-)ASDM with 4 DOAs at the same time. Listening experiments comparing the new Salient/Diffuse S/D-ASDM to ASDM, 4D-ASDM, and the original FOA RIR reveal the best results for the new algorithm in both spatial clarity and absence of artifacts, especially for its variant, which keeps the DOA constant within each salient event block.
Speakers
LG

Lukas Gölles

University of Music and Performing Arts Graz - Institute of Electronic Music and Acoustics
Thursday May 22, 2025 11:00am - 11:20am CEST
C2 ATM Studio Warsaw, Poland

11:20am CEST

Towards a standard listener-independent HRTF to facilitate long-term adaptation
Thursday May 22, 2025 11:20am - 11:40am CEST
Head-related transfer functions (HRTFs) are used in auditory applications for spatializing virtual sound sources. Listener-specific HRTFs, which aim at mimicking the filtering of the head, torso and pinnae of a specific listener, improve the perceived quality of virtual sound compared to using non-individualized HRTFs. However, using listener-specific HRTFs may not be accessible for everyone. Here, we propose as an alternative to take advantage of the adaptation abilities of human listeners to a new set of HRTFs. We claim that agreeing upon a single listener-independent set of HRTFs has beneficial effects for long-term adaptation compared to using several, potentially severely different HRTFs. Thus, the Non-individual Ear MOdel (NEMO) initiative is a first step towards a standardized listener-independent set of HRTFs to be used across applications as an alternative to individualization. A prototype, NEMObeta, is presented to explicitly encourage external feedback from the spatial audio community, and to agree on a complete list of requirements for the future HRTF selection.
Speakers
avatar for Katharina Pollack

Katharina Pollack

PhD student in spatial audio, Acoustics Research Institute Vienna & Imperial College London
Katharina Pollack studied electrical engineering audio engineering in Graz, both at the Technical University and the University of Music and Performing Arts in Graz and is doing her PhD at the Acoustics Research Institute in Vienna in the field of spatial hearing. Her main research... Read More →
avatar for Nils Meyer-Kahlen

Nils Meyer-Kahlen

Aalto University
Thursday May 22, 2025 11:20am - 11:40am CEST
C2 ATM Studio Warsaw, Poland

11:40am CEST

Real-Time Auralization Pipeline for First-Person Vocal Interaction in Audio-Visual Virtual Environments
Thursday May 22, 2025 11:40am - 12:00pm CEST
Multimodal research and applications are becoming more commonplace as Virtual Reality (VR) technology integrates different sensory feedback, enabling the recreation of real spaces in an audio-visual context. Within VR experiences, numerous applications rely on the user’s voice as a key element of interaction, including music performances and public speaking applications. Self-perception of our voice plays a crucial role in vocal production. When singing or speaking, our voice interacts with the acoustic properties of the environment, shaping the adjustment of vocal parameters in response to the perceived characteristics of the space.

This technical report presents a real-time auralization pipeline that leverages three-dimensional Spatial Impulse Responses (SIRs) for multimodal research applications in VR requiring first-person vocal interaction. It describes the impulse response creation and rendering workflow, the audio-visual integration, and addresses latency and computational considerations. The system enables users to explore acoustic spaces from various positions and orientations within a predefined area, supporting three and five Degrees of Freedom (3Dof and 5DoF) in audio-visual multimodal perception for both research and creative applications in VR.

The design of this pipeline arises from the limitations of existing audio tools and spatializers, particularly regarding signal latency, and the lack of SIRs captured from a first-person perspective and in multiple adjacent distributions to enable translational rendering. By addressing these gaps, the system enables real-time auralization of self-generated vocal feedback.
Speakers
avatar for Enda Bates

Enda Bates

Assistant Prof., Trinity College Dublin
I'm interested in spatial audio, spatial music, and psychoacoustics. I'm the deputy director of the Music & Media Technologies M.Phil. programme in Trinity College Dublin, and a researcher with the ADAPT centre. At this convention I'm presenting a paper on a Ambisonic Decoder Test... Read More →
Thursday May 22, 2025 11:40am - 12:00pm CEST
C2 ATM Studio Warsaw, Poland

12:00pm CEST

On the Design of Binaural Rendering Library for IAMF Immersive Audio Container
Thursday May 22, 2025 12:00pm - 12:20pm CEST
Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications.
Speakers
avatar for Gavin Kearney

Gavin Kearney

Professor of Audio Engineering, University of York
Gavin Kearney graduated from Dublin Institute of Technology in 2002 with an Honors degree in Electronic Engineering and has since obtained MSc and PhD degrees in Audio Signal Processing from Trinity College Dublin. He joined the University of York as Lecturer in Sound Design in January... Read More →
avatar for Jan Skoglund

Jan Skoglund

Google
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
Thursday May 22, 2025 12:00pm - 12:20pm CEST
C2 ATM Studio Warsaw, Poland

2:45pm CEST

Tutorial: Capturing Your Prosumers
Thursday May 22, 2025 2:45pm - 3:45pm CEST
Tutorial: Capturing Your Prosumers
This session breaks down how top brands like Samsung, Apple, and Slack engage professional and semi-professional buyers. Attendees will gain concrete strategies and psychological insights they can use to boost customer retention and revenue.

Format: 1-Hour Session
Key Takeaways:
- Understand the psychology behind purchasing decisions of prosumers, drawing on our access to insights from over 300 million global buyers
- Explore proven strategies to increase engagement and revenue
- Gain actionable frameworks for immediate implementation
Speakers
Thursday May 22, 2025 2:45pm - 3:45pm CEST
C2 ATM Studio Warsaw, Poland

4:00pm CEST

Key Technology Briefings
Thursday May 22, 2025 4:00pm - 6:00pm CEST
Thursday May 22, 2025 4:00pm - 6:00pm CEST
C2 ATM Studio Warsaw, Poland
 
Friday, May 23
 

9:15am CEST

Investigating Individual, Loudness-Dependent Equalization Preferences in Different Driving Sound Conditions
Friday May 23, 2025 9:15am - 9:35am CEST
In automotive audio playback systems, dynamically increasing driving sounds are typically taken into account by applying a generic, i.e., non-individualized, increase in overall level and low-frequency amplification to compensate increased masking. This study investigated the degree of individuality regarding the preferences of noise-dependent level and equalizer settings. A user study with 18 normal-hearing participants was conducted in which individually preferred level-dependent and frequency-dependent amplification parameters were determined using a music-based procedure in quiet and in nine different driving noise conditions. The comparison of self-adjusted parameters suggested that, on average, participants adjusted higher overall levels and more low-frequency amplification in noise than in quiet. However, preferred self-adjusted levels differedmarkedly between participants for the same listening conditions but were quite similar in a re-test session for each participant, indicating that individual preferences were stable and could be reproducibly measured with the employed personalization scheme. Furthermore, the impact of driving noise on individually preferred settings revealed strong interindividual differences, indicating that listeners can differ widely with respect to their individual optimum of how equalizer and level settings should be dynamically adapted to changes in driving conditions.
Speakers
avatar for Jan Rennies

Jan Rennies

Head of Group Personalized Hearing Systems, Fraunhofer Institute for Digital Media Technology IDMT
I am headin´g a group at Fraunhofer IDMT dedicated to developing new solutions for better communication, hearing, and hearing health in various applications together with partners from industry and academia. I am particularly interested in networking and exploring opportunities for... Read More →
Friday May 23, 2025 9:15am - 9:35am CEST
C2 ATM Studio Warsaw, Poland

9:35am CEST

Subjective test of loudspeaker virtualization
Friday May 23, 2025 9:35am - 9:55am CEST
In this contribution we present subjective tests of loudspeaker virtualization, a technique enabling the application of specific target behaviors to the physical loudspeaker system. In this work, loudspeaker virtualization is applied to virtualize a closed box car audio subwoofer to replicate the performance of a larger vented enclosure. The tests are designed to determine if any reduction in sound quality is detected by a panel of listeners when a virtualized loudspeaker is used.
Friday May 23, 2025 9:35am - 9:55am CEST
C2 ATM Studio Warsaw, Poland

9:55am CEST

Objective measurements for basic sound quality and special audio features in cars
Friday May 23, 2025 9:55am - 10:15am CEST
Car audio systems aim to provide information, entertainment, and acoustic comfort to drivers and passengers in cars. In addition to basic audio functions for broadcasting, playing chimes, warning sound, and music, there are special audio features such as vehicle noise compensation, spatial sound effects, individual sound zone, and active noise control. In this paper, commonly used objective measurement methods for basic sound quality and special features in cars are reviewed and discussed. All objective measurements are proposed to use the 6-unit microphone array specified in the White Paper for In-car Acoustic Measurements released by AES Technical Committee on Automotive Audio in 2023, and the main parameters to be measured are frequency responses and sound pressure levels in the car when the specially designed test signals are played back. General measurement frameworks and procedures for basic sound quality and each feature are presented. The advantages and weakness of using these parameters to characterize the basic sound quality and special features of a car audio system are discussed, and the challenges and future directions are explored.
Speakers
avatar for Xiaojun Qiu

Xiaojun Qiu

Huawei
Dr. Xiaojun Qiu is currently a Chief Scientist in Audio and Acoustics at Huawei. Before he joined Huawei in late 2020, he had been a professor in several universities for nearly 20 years. He is a Fellow of Audio Engineering Society and a Fellow of International Institute of Acoustics... Read More →
Friday May 23, 2025 9:55am - 10:15am CEST
C2 ATM Studio Warsaw, Poland

10:40am CEST

Acoustic Objects: bridging immersive audio creation and distribution systems
Friday May 23, 2025 10:40am - 11:00am CEST
In recent years, professional and consumer audio and music technology has advanced in several areas, including sensory immersion, electronic transmission, content formats, and creation tools. The production and consumption of immersive media experiences increasingly rely on a global network of interconnected frameworks. These experiences, once confined to separate content markets like music, movies, video games, and virtual reality, are now becoming interoperable, ubiquitous, and adaptable to individual preferences, conditions, and languages. This article explores this evolution, focusing on flexible immersive audio creation and reproduction. We examine the development of object-based immersive audio technology and its role in unifying broadcast content with embodied experiences. We introduce the concept of Acoustic Objects, proposing a universal spatial audio scene representation model for creating and distributing versatile, navigable sound in music, multimedia, and virtual or extended reality applications.
Speakers
avatar for Jean-Marc Jot

Jean-Marc Jot

Founder and Principal, Virtuel Works LLC
Spatial audio and music technology expert and innovator. Virtuel Works provides audio technology strategy, IP creation and licensing services to help accelerate the development of audio and music spatial computing technology and interoperability solutions.
avatar for Thibaut Carpentier

Thibaut Carpentier

STMS Lab - IRCAM, SU, CNRS, Ministère de la Culture
Thibaut Carpentier studied acoustics at the École centrale and signal processing at Télécom ParisTech, before joining the CNRS as a research engineer. Since 2009, he has been a member of the Acoustic and Cognitive Spaces team in the STMS Lab (Sciences and Technologies of Music... Read More →
Friday May 23, 2025 10:40am - 11:00am CEST
C2 ATM Studio Warsaw, Poland

11:00am CEST

Immersive Music Production Workflows: An Ethnographic Study of Current Practices
Friday May 23, 2025 11:00am - 11:20am CEST
This study presents an ethnographic analysis of current immersive music production workflows, examining industry trends, tools, and methodologies. Through interviews and participant observations with professionals across various sectors, the research identifies common patterns, effective strategies, and persistent obstacles in immersive audio production. Key findings highlight the ongoing struggle for standardized workflows, the financial and technological barriers faced by independent artists, and the critical role of collaboration between engineers and creatives. Despite the growing adoption of immersive formats, workflows still follow stereo conventions, treating spatialization as an afterthought and complicating the translation of mixes across playback systems. Additionally, the study explores the evolving influence of object-based and bed-based mixing techniques, monitoring inconsistencies across playback systems, and the need for improved accessibility to immersive production education. By synthesizing qualitative insights, this paper contributes to the broader discourse on immersive music production, offering recommendations for future research and industry-wide best practices to ensure the sustainable integration of spatial audio technologies.
Speakers
avatar for Marcela Rada

Marcela Rada

Audio Engineer
Marcela is a talented and accomplished audio engineer that has experience both in the studio and in the classroom teaching university level students the skills of becoming professional audio engineers and music producers. She has worked across music genres recording, editing, mixing... Read More →
RM

Russell Mason

Institute of Sound Recording, University of Surrey
avatar for Enzo De Sena

Enzo De Sena

Senior Lecturer, University of Surrey
Enzo De Sena is a Senior Lecturer at the Institute of Sound Recording at the University of Surrey. He received the M.Sc. degree (cum laude) in Telecommunication engineering from the Università degli Studi di Napoli “Federico II,” Italy, in 2009 and the PhD degree in Electronic Engineering from King’s College London, UK, in 2013. Between 2013 and 2016 he was a postdoctoral researcher at KU Leuven... Read More →
Friday May 23, 2025 11:00am - 11:20am CEST
C2 ATM Studio Warsaw, Poland

11:20am CEST

Spherical harmonic beamforming based Ambisonics encoding and upscaling method for smartphone microphone array
Friday May 23, 2025 11:20am - 11:40am CEST
With the rapid development of virtual reality (VR) and augmented reality (AR), spatial audio recording and reproduction have gained increasing research interest. Higher Order Ambisonics (HOA) stands out for its adaptability to various playback devices and its ability to integrate head orientation. However, current HOA recordings often rely on bulky spherical microphone arrays (SMA), and portable devices like smartphones are limited by array configuration and number of microphones. We propose a method for HOA encoding using a smartphone microphone array (SPMA). By designing beamformers for each order of spherical harmonic functions based on the array manifold, the method enables HOA encoding and up-scaling. Validation on a real SPMA and its simulated free-field counterpart in noisy and reverberant conditions showed that the method successfully encodes and up-scales HOA up to the fourth order with just four irregularly arranged microphones.
Friday May 23, 2025 11:20am - 11:40am CEST
C2 ATM Studio Warsaw, Poland

12:15pm CEST

The Future Of Spatial Audio For Consumers
Friday May 23, 2025 12:15pm - 1:15pm CEST
As spatial audio shifts from a premium feature to a mainstream expectation, significant challenges remain in delivering a uniform experience across devices, formats, and playback systems. This panel brings together industry and academic experts to explore the key technologies driving the future of immersive audio for consumers. We’ll discuss the core technological advancements, software, hardware, and ecosystem innovations necessary to enable more seamless and consistent spatial audio experiences. Additionally, we will examine the challenges of delivering perceptually accurate spatial audio across diverse playback environments and identify the most critical areas of focus for industry and academia to accelerate broader consumer adoption of spatial audio.
Speakers
avatar for Jacob Hollebon

Jacob Hollebon

Principal Research Engineer, Audioscenic
I am a researcher specialising in 3D spatial audio reproduction and beamforming using loudspeaker arrays. In my current role at Audioscenic I am helping commercialize innovate listener-adaptive loudspeaker arrays for 3D audio and multizone reproduction. Previously I developed a new... Read More →
avatar for Marcos Simón

Marcos Simón

CTO, Audioscenic
avatar for Jan Skoglund

Jan Skoglund

Google
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
Friday May 23, 2025 12:15pm - 1:15pm CEST
C2 ATM Studio Warsaw, Poland

1:30pm CEST

On the effect of photogrammetric reconstruction and pinna deformation methods on individual head-related transfer functions
Friday May 23, 2025 1:30pm - 1:50pm CEST
Individual head-related transfer functions (HRTFs) are instrumental in rendering plausible spatial audio playback over headphones as well as in understanding auditory perception. Nowadays, the numerical calculation of individual HRTFs is achievable even without high-performance computers. However, the main obstacle is the acquisition of a mesh of the pinnae with a submillimeter accuracy. One approach to this problem is the photogrammetric reconstruction (PR), which estimates a 3D shape from 2D input, e.g., photos. Albeit easy to use, this approach comes with a trade-off in the resulting mesh quality, which subsequently has a substantial impact on the HRTF's quality. In this study, we investigated the effect of PR on HRTF quality as compared to HRTFs calculated from a reference mesh acquired with a high-quality structured-light scanner. Additionally, we applied two pinna deformation methods, which registered a non-individual high-quality pinna to the individual low-quality PR pinna by means of geometric distances. We investigated the potential of these methods to improve the quality of the PR-based pinna meshes. Our evaluation involved the geometrical, acoustical, and psychoacoustical domains including a sound-localization experiment with 9 participants. Our results show that neither PR nor PR-improvement methods were able to provide individual HRTFs of sufficient quality, indicating that without extensive pre- or post-processing, PR provides too little individual detail in the HRTF-relevant pinna regions.
Speakers
avatar for Katharina Pollack

Katharina Pollack

PhD student in spatial audio, Acoustics Research Institute Vienna & Imperial College London
Katharina Pollack studied electrical engineering audio engineering in Graz, both at the Technical University and the University of Music and Performing Arts in Graz and is doing her PhD at the Acoustics Research Institute in Vienna in the field of spatial hearing. Her main research... Read More →
avatar for Piotr Majdak

Piotr Majdak

Austrian Academy of Sciences
Friday May 23, 2025 1:30pm - 1:50pm CEST
C2 ATM Studio Warsaw, Poland

1:50pm CEST

Mesh2PPM - Automatic Parametrization of the BezierPPM: Entire Pinna
Friday May 23, 2025 1:50pm - 2:10pm CEST
An individual human pinna geometry can be used to achieve plausible personalized audio reproduction. However, an accurate acquisition of the pinna geometry typically requires the use of specialized equipment and often involves time-consuming post-processing to remove potential artifacts. To obtain an artifact-free but individualized mesh, a parametric pinna model based on cubic Bézier curves (BezierPPM) can be used to represent an individual pinna. However, the parameters need to be manually tuned to the acquired listener’s geometry. For increased scalability, we propose Mesh2PPM, a framework for an automatic estimation of BezierPPM parameters from an individual pinna. Mesh2PPM relies on a deep neural network (DNN) that was trained on a dataset of synthetic multi-view images rendered from BezierPPM instances. For the evaluation, unseen BezierPPM instances were presented to Mesh2PPM which inferred the BezierPPM parameters. We subsequently assessed the geometric errors between the meshes obtained from the BezierPPM parametrized with the inferred parameters and the actual pinna meshes. We investigated the effects of the camera-grid type, jittered camera positions, and additional depth information in images on the estimation quality. While depth information had no effect, the camera-grid type and the jittered camera positions both had effects. A camera grid of 3×3 provided the best estimation quality, yielding Pompeiu-Hausdorff distances of 2.05 ± 0.4 mm and 1.4 ± 0.3 mm with and without jittered camera
positions, respectively, and root-mean-square (RMS) distances of 0.92 ± 0.12 mm and 0.52 ± 0.07 mm. These results motivate further improvements of the proposed framework to be ultimately applicable for an automatic estimation of pinna geometries obtained from actual listeners.
Speakers
Friday May 23, 2025 1:50pm - 2:10pm CEST
C2 ATM Studio Warsaw, Poland

2:10pm CEST

Towards a Headphone Target Curve for Spatial Audio
Friday May 23, 2025 2:10pm - 2:30pm CEST
In order to reproduce audio over headphones as in-
tended, it is essential to have well-defined and con-
sistent references of how headphones should sound.
With the aim of stereo reproduction in mind, the field
has established a de-facto reference target curve called
the Harman Target Curve to which headphone transfer
functions are commonly compared. This contribution
questions if the same target curve is suitable when used
for the reproduction of spatial audio. First, the ori-
gins the Harman Curve are revisited; it is motivated by
the frequency response of loudspeaker playback in a
specific listening room. The necessary measurement
procedures are described in detail. Then, the paper
discusses the applicability of existing targets to spa-
tial audio. Therein, it is possible to embed convincing
spatial room information directly into the production,
thereby calling into question the motivation for incor-
porating a listening room in the headphone target. The
paper concludes with a listening experiment that com-
pares the preference of different target curves for both
spatial audio and stereo
Speakers
AM

Alexander Mülleder

Graz University of Technology
avatar for Nils Meyer-Kahlen

Nils Meyer-Kahlen

Aalto University
Friday May 23, 2025 2:10pm - 2:30pm CEST
C2 ATM Studio Warsaw, Poland

2:30pm CEST

Sound Source Directivity Estimation in Spherical Fourier Domain from Sparse Measurements
Friday May 23, 2025 2:30pm - 2:50pm CEST
In recent years, applications such as virtual reality (VR) systems and room acoustics simulations have brought the modeling of sound source directivity into focus. An accurate simulation of directional responses of sound sources is essential in immersive audio applications.

Real sound sources have directional properties that are different from simple sources such as monopoles, which are sources frequently used for modeling more complex acoustic fields. For instance, the sound level of human speech as a sound source varies considerably depending on where the sound is recorded with respect to the talker’s head. The same is true for loudspeakers, which are considered linear and time-independent sources. When the sound is recorded behind the speaker, it is normal to observe differences of up to 20 dB SPL at some frequencies. The directional characteristics of sound sources become particularly pronounced at high frequencies. The propagation of real sound sources, such as human voices or musical instruments, differs from simple source models like monopoles, dipoles, and quadrupoles due to their physical structures.

The common approach to measuring directivity patterns of sound sources involves surrounding a sound source in an anechoic chamber with a high number of pressure microphones on a spherical grid and registering the sound power at these positions. Apart from the prohibitive hardware requirements, such measurement setups are mostly impractical and costly. Audio system manufacturers have developed various methods for measuring sound source directionality over the years. These methods are generally of high technical complexity.

This article proposes a new, reduced-complexity directivity measurement approach based on the spherical harmonic decomposition of the sound field. The method estimates the directional characteristics of sound sources using fewer measurement points with spherical microphone arrays. The spherical harmonic transform allows for the calculation of directivity using data collected from spherical microphone arrays instead of pressure sensors. The proposed method uses both the pressure component and spatial derivatives of the sound field and successfully determines directivity with sparse measurements.

An estimation model based on the spherical Fourier transform was developed, measurements were carried out to test this model, and preliminary results obtained from the estimation model are presented. Experiments conducted at the METU Spatial Audio Research Laboratory demonstrated the effectiveness of the proposed method. The directivity characteristics of Genelec 6010A loudspeaker are measured using eight 3rd-order spherical microphone arrays. The directivity functions obtained were highly consistent with the data provided by the loudspeaker manufacturer. The results, especially in low and mid-frequency bands, indicate the utility of the proposed method.
Friday May 23, 2025 2:30pm - 2:50pm CEST
C2 ATM Studio Warsaw, Poland

2:50pm CEST

Perceptual evaluation of professional point and line sources for immersive audio applications
Friday May 23, 2025 2:50pm - 3:10pm CEST
Immersive sound reinforcement aims to create a balanced perception of sounds arriving from different directions, establishing an impression of envelopment over the audience area. Current perceptual research shows that coverage designs featuring nearly constant decay (0dB per distance doubling) preserve the level balance among audio objects in the mix. In contrast, a -3dB decay supports a more uniform sensation of envelopment, especially for off-center listening positions. For practical reasons, point-source loudspeakers remain widely used for immersive audio playback in mid-sized venues. However, point-source loudspeakers inherently decay by -6dB per distance doubling, and using them can conflict with the design goals outlined above. In this paper, we investigate the perceived differences between point-source and line-source setups using eight surrounding loudspeakers side-by-side covering a 10m x 7m audience area. The perceptual qualities of object level balance, spatial definition, and envelopment were compared in a MUSHRA listening experiment, and acoustic measurements were carried out to capture room impulse responses and binaural room impulse responses (BRIRs) of the experimental setup. The BRIRs were used to check whether the results of the listening experiment were reproducible on headphones. Both the loudspeaker and headphone-based experiments delivered highly correlated results. Also, regression models devised based on the acoustic measurements are highly correlated to the perceptual results. The results confirm that elevated line sources, exhibiting a practically realizable decay of -2dB per distance doubling, help preserve object-level balance, increase spatial definition, and provide a uniform envelopment experience throughout the audience area compared to point-source loudspeakers.
Speakers
avatar for Franz Zotter

Franz Zotter

University of Music and Performing Arts Graz
Franz Zotter received an M.Sc. degree in electrical and audio engineering from the University of Technology (TUG) in 2004, a Ph.D. degree in 2009 and a venia docendi in 2023 from the University of Music and Performing Arts (KUG) in Graz, Austria. He joined the Institute of Electronic... Read More →
avatar for Philip Coleman

Philip Coleman

Senior Immersive Audio Research Engineer, L-Acoustics
I'm a research engineer in the L-ISA immersive audio team at L-Acoustics, based in Highgate, London. I'm working on the next generation of active acoustics and object-based spatial audio reproduction, to deliver the best possible shared experiences.Before joining L-Acoustics in September... Read More →
Friday May 23, 2025 2:50pm - 3:10pm CEST
C2 ATM Studio Warsaw, Poland

3:45pm CEST

A Curvilinear Transfer Function for Wide Dynamic Range Compression With Expansion
Friday May 23, 2025 3:45pm - 4:05pm CEST
Wide Dynamic Range Compression in hearing aids is becoming increasingly more complex as the number of channels and adjustable parameters grow. At the same time, there is growing demand for customization and user self-adjustment of hearing aids, necessitating a balance between complexity and user accessibility. Compression in hearing aids is governed by the input-output transfer function, which relates input magnitude to output magnitude, and is typically defined as a combination of linear piecewise segments resembling logarithmic behavior. This work presents an alternative to the conventional compression transfer function that consolidates multiple compression parameters and revisits expansion in hearing aids. The
curvilinear transfer function is a continuous curve with logarithm-like behavior, governed by two parameters—gain and compression ratio. Experimental results show that curvilinear compression reduces the amplification of low-level noise, improves signal-to-noise ratio by up to 1.0 dB, improves sound quality as measured by the Hearing Aids Speech Quality Index by up to 6.7%, and provides comparable intelligibility as measured by the Hearing Aids Speech Perception Index, with simplified parameterization compared to conventional compression.
The consolidated curvilinear transfer function is highly applicable to over-the-counter hearing aids and offers more capabilities for customization than current prominent over-the-counter and self-adjusted hearing aids.
Friday May 23, 2025 3:45pm - 4:05pm CEST
C2 ATM Studio Warsaw, Poland

4:05pm CEST

Tiresias - An Open-Source Hearing Aid Development Board
Friday May 23, 2025 4:05pm - 4:25pm CEST
Hearing loss is a global public health issue due to its high prevalence and negative impact on various aspects of one’s life, including well being and cognition. Despite their crucial role in auditory rehabilitation, hearing aids remain inaccessible to many due to their high costs, particularly in low- and middle-income countries. Existing open-source solutions often rely on high-power, bulky platforms rather than compact, low-power wearables suited for real-world applications. This work introduces Tiresias, an open-source hearing aid development board designed for real-time audio processing using low-cost electronics. Integrating key hearing aid functionalities into a compact six-layer printed circuit board (PCB), Tiresias features multichannel compression, digital filtering, beamforming, Bluetooth connectivity, and physiological data monitoring, fostering modularity and accessibility through publicly available hardware and firmware resources based on the Nordic nRF Connect and Zephyr real-time operating system (RTOS). By addressing technological and accessibility challenges, this work advances open-source hearing aid development, enabling research in hearing technologies, while also supporting future refinements and real-world validation.
Friday May 23, 2025 4:05pm - 4:25pm CEST
C2 ATM Studio Warsaw, Poland
 
Saturday, May 24
 

9:00am CEST

A new one-third-octave-band noise criteria
Saturday May 24, 2025 9:00am - 9:20am CEST
A new one-third-octave-band noise criteria (NC) rating method is presented. one-third-octave-band NC curves from NC 70 to NC 0 are derived from the existing octave-band curves, adjusted for bandwidth, fit to continuous functions, and redistributed progressively over this space. This synthesis is described in detail. The diffuse field hearing threshold at low frequencies is also derived. Several NC curves at high frequencies are shown to be below threshold (inaudible). NC ratings are calculated using both the new one-third-octave-band and the legacy octave-band methods for a number of different room noise spectra. The resulting values were found to be similar for both methods. NC ratings using the new method are particularly applicable to very low noise level critical listening environments such as recording studios, scoring stages, and cinema screening rooms, but are shown to also be applicable to higher noise level environments. The proposed method better tracks the audibility of noise at low levels as well as the audibility of tonal noise components, while the legacy method as originally conceived generally emphasizes speech interference.
Saturday May 24, 2025 9:00am - 9:20am CEST
C2 ATM Studio Warsaw, Poland

9:20am CEST

Mixed-Phase Equalization of Slot-loaded Impulse Responses
Saturday May 24, 2025 9:20am - 9:40am CEST
This paper introduces a new algorithm for multiposition mixed-phase equalization of slot-loaded loudspeaker responses obtained in the horizontal and vertical plane, using finite impulse response (FIR) filters. The algorithm selects a {\em prototype response} that yields a filter that best optimizes a time-domain-based objective metric for equalization for a given direction. The objective metric includes a weighted linear combination of pre-ring energy, early and late reflection energy, and decay rate (characterizing impulse response shortening) during filter synthesis. The results show that the presented mixed-phase multiposition filtering algorithm performs a good equalization along all horizontal directions and for most positions in the vertical direction. Beyond the multiposition filtering capabilities, the algorithm and the metric are suitable for designing mixed-phase filters with low delays, an essential constraint for real-time processing.
Speakers
avatar for Sunil Bharitkar

Sunil Bharitkar

Samsung Research America
Saturday May 24, 2025 9:20am - 9:40am CEST
C2 ATM Studio Warsaw, Poland

9:40am CEST

Analog Pseudo Leslie Effect with High Grade of Repeatability
Saturday May 24, 2025 9:40am - 10:00am CEST
This paper describes the design of an Analog Stomp Box capable of reproducing the effect observed when a loudspeaker is rotated during operation, the so-called Leslie effect. When the loudspeaker is rotating two physical effects can be observed: The first is a variation of the amplitude because sometimes the speaker is aimed at the observer and then, after 180 degrees of rotation, the loudspeaker is aimed opposing to the observer. To recreate this variation in amplitude, a circuit called Tremolo was designed to achieve this effect. The second is the Doppler effect, which was obtained with a circuit designed to vary the phase of the signal (Vibrato). The phase variation simulates a frequency variation for the ears. Assembling these two circuits in cascade, it is obtained the Pseudo Leslie Effect. These Vibrato and Tremolo circuits receive the control signal from a Low Frequency Oscillator (LFO) which controls the effect frequency. To get a high degree of repeatability, which is not simple in analog circuits employing photocouplers, those photocoupler devices were replaced with VCAs. The photocouplers have a great variation of your optical characteristics, so it is hard to obtain the same result in a large-scale production. However, using VCAs it turns to be easily achievable. The THAT2180 IC is a VCCS, Voltage-Controlled Current Source with an exponential gain control and low signal distortion. The term Pseudo was used because, in the Leslie Effect, the rotation of the loudspeaker gives a lag of 90o between the frequency and amplitude variations. This lag has not been implemented, but the sonic result left nothing to be desired.
Saturday May 24, 2025 9:40am - 10:00am CEST
C2 ATM Studio Warsaw, Poland

10:00am CEST

Computational Complexity Analysis of the K-Method for Nonlinear Circuit Modeling
Saturday May 24, 2025 10:00am - 10:20am CEST
In today's music industry and among musicians, instead of using analog hardware effects to alter sound, digital counterparts are increasingly being used, often in the form of software plugins. The circuits of musical devices often contain nonlinear components (diodes, vacuum tubes, etc.), which complicates their digital modeling. One of the approaches to address this is the use of state-space methods, such as the Euler or Runge-Kutta methods. To guarantee stability, implicit state-space methods should be used; however, they require the numerical solution of an implicit equation, leading to large computational complexity. Alternatively, the K-method can be used that avoids the need of numerical methods if the system meets certain conditions, thus significantly decreasing the computational complexity. Although the K-method has been invented almost three decades ago, the authors are not aware of a thorough computational complexity analysis of the method in comparison to the more common implicit state-space approaches, such as the backward Euler method. This paper introduces these two methods, explores their advantages, and compares their computational load as a function of model size by using a scalable circuit example.
Saturday May 24, 2025 10:00am - 10:20am CEST
C2 ATM Studio Warsaw, Poland

10:40am CEST

A simplified RLS algorithm for adaptive Kautz filters
Saturday May 24, 2025 10:40am - 11:00am CEST
Modeling or compensating a given transfer function is a common task in the field of audio. To comply with the characteristics of hearing, logarithmic frequency resolution filters have been developed, including the Kautz filter, which has orthogonal tap outputs. When the system to be modeled is time-varying, the modeling filter should be tuned to follow the changes in the transfer function. The Least Mean Squares (LMS) and Recursive Least Squares (RLS) algorithms are well-known methods for adaptive filtering, where the latter has faster convergence rate with lower remaining error, at the expense of high computational demand. In this paper we propose a simplification to the RLS algorithm, which builds on the orthogonality of the tap outputs of Kautz filters, resulting in a significant reduction in computational complexity.
Saturday May 24, 2025 10:40am - 11:00am CEST
C2 ATM Studio Warsaw, Poland

11:00am CEST

An Artificial Reverberator Informed by Room Geometry and Visual Appearance
Saturday May 24, 2025 11:00am - 11:20am CEST
Without relying on audio data as a reference, artificial reverberation models often struggle to accurately simulate
the acoustics of real rooms. To address this, we propose a hybrid reverberator derived from a room’s physical
properties. Room geometry is extracted via Light Detection and Ranging mapping, enabling the calculation of
acoustic reflection paths via the Image Source Method. Frequency-dependent absorption is found by classifying
room surface materials with a multi-modal Large Language Model and referencing a database of absorption
coefficients. The extracted information is used to parametrise a hybrid reverberator, divided into two components:
early reflections, using a tapped delay line, and late reverberation, using a Scattering Feedback Delay Network.
Our listening test results show that participants often rate the proposed system as the most natural simulation of a
small hallway room. Additionally, we compare the reverberation metrics of the hybrid reverberator and similar
state-of-the-art models to those of the small hallway.
Speakers
avatar for Joshua Reiss

Joshua Reiss

Professor, Queen Mary University of London
Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →
Saturday May 24, 2025 11:00am - 11:20am CEST
C2 ATM Studio Warsaw, Poland

11:20am CEST

Direct convolution of high-speed 1 bit signal and finite impulse response
Saturday May 24, 2025 11:20am - 11:40pm CEST
Various AD conversion methods exist, and high-speed 1 bit method have been proposed with using a high sampling frequency and 1 bit quantization. The ΔΣ modulation is mainly used, and due to its characteristic, these signals are able to accurately preserve the spectrum of the analog signal and move quantization noise into higher frequency bands, which allows for a high signal-to-noise ratio in the audible range. However, When performing signal processing tasks such as addition and multiplication on high-speed 1 bit signals, it is generally necessary to convert them into multi-bit signals for arithmetic operations. In this paper, we propose a direct processing method for high-speed 1 bit signal without converting them into multi-bit signal and the convolution is realized. In this method, 1 bit data are reordered to achieve operations without arithmetic one. The proposed method was verified through the simulations with using low-pass FIR filters. Frequency-domain analysis showed that the proposed method achieved equivalent performance to conventional multi-bit convolutions with successfully performing the desired filtering. In this paper, we present a novel approach to directly processing high-speed 1 bit signals and suggest potential applications in audio and signal processing fields.
Speakers
Saturday May 24, 2025 11:20am - 11:40pm CEST
C2 ATM Studio Warsaw, Poland

11:40am CEST

Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
Saturday May 24, 2025 11:40am - 12:00pm CEST
Speech denoising is a prominent and widely utilized task, appearing in many common use-cases. Although there are very powerful published machine learning methods, most of those are too complex for deployment in everyday and/or low resources computational environments, like hand-held devices, smart glasses, hearing aids, automotive platforms, etc. Knowledge distillation (KD) is a prominent way for alleviating this complexity mismatch, by transferring the learned knowledge from a pre-trained complex model, the teacher, to another less complex one, the student. KD is implemented by using minimization criteria (e.g. loss functions) between learned information of the teacher and the corresponding one from the student. Existing KD methods for speech denoising hamper the KD by bounding the learning of the student to the distribution learned by the teacher. Our work focuses on a method that tries to alleviate this issue, by exploiting properties of the cosine similarity used as the KD loss function. We use a publicly available dataset, a typical architecture for speech denoising (e.g. UNet) that is tuned for low resources environments and conduct repeated experiments with different architectural variations between the teacher and the student, reporting mean and standard deviation of metrics of our method and another, state-of-the-art method that is used as a baseline. Our results show that with our method we can make smaller speech denoising models, capable to be deployed into small devices/embedded systems, to perform better compared to when typically trained and when using other KD methods.
Saturday May 24, 2025 11:40am - 12:00pm CEST
C2 ATM Studio Warsaw, Poland

12:15pm CEST

Simulated Free-field Measurements
Saturday May 24, 2025 12:15pm - 1:45pm CEST
Time selective techniques that enable measurements of the free field response of a loudspeaker to be performed without the need for an anechoic chamber are presented. The low frequency resolution dependent room size limitations of both time selective measurements and anechoic chambers are discussed. Techniques combining signal processing and appropriate test methods are presented enabling measurements of the complex free field response of a loudspeaker to be performed throughout the entire audio frequency range without an anechoic chamber. Measurement technique for both nar field and time selective far field measurements are detailed. The results in both the time and frequency domain are available and ancilliary functions derived from these results are easily calculated automatically. A review of the current state of the art is also presented.
Saturday May 24, 2025 12:15pm - 1:45pm CEST
C2 ATM Studio Warsaw, Poland
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
  • Acoustic Transducers & Measurements
  • Acoustics
  • Acoustics of large performance or rehearsal spaces
  • Acoustics of smaller rooms
  • Acoustics of smaller rooms Room acoustic solutions and materials
  • Acoustics & Sig. Processing
  • AI
  • AI & Machine Audition
  • Analysis and synthesis of sound
  • Archiving and restoration
  • Audio and music information retrieval
  • Audio Applications
  • Audio coding and compression
  • Audio effects
  • Audio Effects & Signal Processing
  • Audio for mobile and handheld devices
  • Audio for virtual/augmented reality environments
  • Audio formats
  • Audio in Education
  • Audio perception
  • Audio quality
  • Auditory display and sonification
  • Automotive Audio
  • Automotive Audio & Perception
  • Digital broadcasting
  • Electronic dance music
  • Electronic instrument design & applications
  • Evaluation of spatial audio
  • Forensic audio
  • Game Audio
  • Generative AI for speech and audio
  • Hearing Loss Protection and Enhancement
  • High resolution audio
  • Hip-Hop/R&B
  • Impact of room acoustics on immersive audio
  • Instrumentation and measurement
  • Interaction of transducers and the room
  • Interactive sound
  • Listening tests and evaluation
  • Live event and stage audio
  • Loudspeakers and headphones
  • Machine Audition
  • Microphones converters and amplifiers
  • Microphones converters and amplifiers Mixing remixing and mastering
  • Mixing remixing and mastering
  • Multichannel and spatial audio
  • Music and speech signal processing
  • Musical instrument design
  • Networked Internet and remote audio
  • New audio interfaces
  • Perception & Listening Tests
  • Protocols and data formats
  • Psychoacoustics
  • Room acoustics and perception
  • Sound design and reinforcement
  • Sound design/acoustic simulation of immersive audio environments
  • Spatial Audio
  • Spatial audio applications
  • Speech intelligibility
  • Studio recording techniques
  • Transducers & Measurements
  • Wireless and wearable audio