Loading…
Type: Spatial Audio clear filter
arrow_back View All Dates
Thursday, May 22
 

9:30am CEST

Correlation between middle and top layer loudspeaker signals and the listening range in 3D audio reproduction
Thursday May 22, 2025 9:30am - 9:50am CEST
In auditory spatial perception, horizontal sound image localization and a sense of spaciousness are based on level and time differences between the left and right ears as cues, and the degree of correlation between the left and right signals is thought to contribute to the sense of horizontal spaciousness, in particular [Hidaka1995, Zotter2013]. For the vertical image spread (VIS), spectral cues are necessary. The change in VIS due to the degree of correlation between the vertical and horizontal signals depends on the frequency response [Gribben2018]. This paper investigated the influence of different correlation values between the top and middle layers of loudspeaker signals within a 3D audio reproduction system on listening impressions through two experiments. The results of experiments using pink noise with different correlation values for the top and middle layers show that the lower the vertical correlation values are, the wider the listening range is, where the impression does not change from the central listening position. From the results of experiments using impulse responses obtained by setting up microphones in an actual concert hall, a tendency to perceive a sense of spaciousness at the off-center listening position was found when cardioid microphones were used for the top layer that were spaced apart from the middle layer. The polar pattern and height of the microphones may have resulted in lower correlation values in the vertical direction, thus widening the listening range of consistent spatial impression outside of the central listening position (i.e., “sweet spot”.)
Speakers
avatar for Toru Kamekawa

Toru Kamekawa

Professor, Tokyo University of the Arts
Toru Kamekawa: After graduating from the Kyushu Institute of Design in 1983, he joined the Japan Broadcasting Corporation (NHK) as a sound engineer. During that period, he gained his experience as a recording engineer, mostly in surround sound programs for HDTV.In 2002, he joined... Read More →
Thursday May 22, 2025 9:30am - 9:50am CEST
C2 ATM Studio Warsaw, Poland

9:50am CEST

Plane wave creation in non-spherical loudspeaker arrays using radius formulation by the Lamé function
Thursday May 22, 2025 9:50am - 10:10am CEST
This paper proposes the method that plane wave field creation with spherical harmonics for a non-spherical array. In sound field control, there are physics-acoustic models and psycho-acoustic models. Some former are allowed in the location of each loudspeaker, but the sound have the differences between the auditory and the reproduction sound because phantom sources are constructed. The latter developed with wave equation under circle or spherical array conditions which are located strictly, and with high order Ambisonics (HOA) based on spherical harmonics which express only a single point. Therefore, we consider requiring the method which physically creates actual waveforms and provides flexibility in the shape of the loudspeaker array. In this paper, we focus on the Lamé function, changing its order as well as the shape of spatial figures, and propose formulating the distance between the center and each loudspeaker using the function in a polar expression. As the simulation experiment, in the inscribed region, the proposed plane wave can create the same waveform as the spherical one under high order Lamé function which is close to rectangular shape.
Speakers
TS

Tomohiro Sakaguchi

Doctoral student, Waseda University
Thursday May 22, 2025 9:50am - 10:10am CEST
C2 ATM Studio Warsaw, Poland

10:10am CEST

Recursive solution to the Broadband Acoustic Contrast Control with Pressure Matching algorithm
Thursday May 22, 2025 10:10am - 10:30am CEST
This paper presents a recursive solution to the Broadband Acoustic Contrast Control with Pressure Matching (BACC-PM) algorithm, designed to optimize sound zones systems efficiently in the time domain. Traditional frequency-domain algorithms, while computationally less demanding, often result in non-causal filters with increased pre-ringing, making time-domain approaches preferable for certain applications. However, time-domain solutions typically suffer from high computational costs as a result of the inversion of large convolution matrices.
To address these challenges, this study introduces a method based on gradient descent and conjugate gradient descent techniques. By exploiting recursive calculations, the proposed approach significantly reduces computational time compared to direct inversion.
Theoretical foundations, simulation setups, and performance metrics are detailed, showcasing the efficiency of the algorithm in achieving high acoustic contrast and low reproduction errors with reduced computational effort. Simulations in a controlled environment demonstrate the advantages of the method.
Speakers
avatar for Manuel Melon

Manuel Melon

Professor, LAUM / LE MANS Université
Thursday May 22, 2025 10:10am - 10:30am CEST
C2 ATM Studio Warsaw, Poland

10:30am CEST

GSound-SIR: A Spatial Impulse Response Ray-Tracing and High-order Ambisonic Auralization Python Toolkit
Thursday May 22, 2025 10:30am - 10:50am CEST
Accurate and efficient simulation of room impulse responses is crucial for spatial audio applications. However, existing acoustic ray-tracing tools often operate as black boxes and only output impulse responses (IRs), providing limited access to intermediate data or spatial fidelity. To address those problems, this paper presents GSound-SIR, a novel Python-based toolkit for room acoustics simulation that addresses these limitations. The contribution of this paper includes the follows. First, GSound-SIR provides direct access to up to millions of raw ray data points from simulations, enabling in-depth analysis of sound propagation paths that was not possible with previous solutions. Second, we introduce a tool to convert acoustic rays into high-order Ambisonic impulse response synthesis, capturing spatial audio cues with greater fidelity than standard techniques. Third, to enhance efficiency, the toolkit implements an energy-based filtering algorithm and can export only the top-X or top-X-% rays. Fourth, we propose to store the simulation results into Parquet formats, facilitating fast data I/O and seamless integration with data analysis workflows. Together, these features make GSound-SIR an advanced, efficient, and modern foundation for room acoustics research, providing researchers and developers with a powerful new tool for spatial audio exploration.
Thursday May 22, 2025 10:30am - 10:50am CEST
C2 ATM Studio Warsaw, Poland

11:00am CEST

Ambisonic Spatial Decomposition Method with salient / diffuse separation
Thursday May 22, 2025 11:00am - 11:20am CEST
This paper proposes a new algorithm for enhancing the spatial resolution of measured first-order Ambisonics room impulse responses (FOA RIRs). It applies a separation of the RIR into a salient stream (direct sound and reflections) and a diffuse stream to treat them differently: The salient stream is enhanced using the Ambisonic Spatial Decomposition Method (ASDM) with a single direction of arrival (DOA) per sample of the RIR, while the diffuse stream is enhanced by 4-directional (4D-)ASDM with 4 DOAs at the same time. Listening experiments comparing the new Salient/Diffuse S/D-ASDM to ASDM, 4D-ASDM, and the original FOA RIR reveal the best results for the new algorithm in both spatial clarity and absence of artifacts, especially for its variant, which keeps the DOA constant within each salient event block.
Speakers
LG

Lukas Gölles

University of Music and Performing Arts Graz - Institute of Electronic Music and Acoustics
Thursday May 22, 2025 11:00am - 11:20am CEST
C2 ATM Studio Warsaw, Poland

11:20am CEST

Towards a standard listener-independent HRTF to facilitate long-term adaptation
Thursday May 22, 2025 11:20am - 11:40am CEST
Head-related transfer functions (HRTFs) are used in auditory applications for spatializing virtual sound sources. Listener-specific HRTFs, which aim at mimicking the filtering of the head, torso and pinnae of a specific listener, improve the perceived quality of virtual sound compared to using non-individualized HRTFs. However, using listener-specific HRTFs may not be accessible for everyone. Here, we propose as an alternative to take advantage of the adaptation abilities of human listeners to a new set of HRTFs. We claim that agreeing upon a single listener-independent set of HRTFs has beneficial effects for long-term adaptation compared to using several, potentially severely different HRTFs. Thus, the Non-individual Ear MOdel (NEMO) initiative is a first step towards a standardized listener-independent set of HRTFs to be used across applications as an alternative to individualization. A prototype, NEMObeta, is presented to explicitly encourage external feedback from the spatial audio community, and to agree on a complete list of requirements for the future HRTF selection.
Speakers
avatar for Katharina Pollack

Katharina Pollack

PhD student in spatial audio, Acoustics Research Institute Vienna & Imperial College London
Katharina Pollack studied electrical engineering audio engineering in Graz, both at the Technical University and the University of Music and Performing Arts in Graz and is doing her PhD at the Acoustics Research Institute in Vienna in the field of spatial hearing. Her main research... Read More →
avatar for Nils Meyer-Kahlen

Nils Meyer-Kahlen

Aalto University
Thursday May 22, 2025 11:20am - 11:40am CEST
C2 ATM Studio Warsaw, Poland

11:40am CEST

Real-Time Auralization Pipeline for First-Person Vocal Interaction in Audio-Visual Virtual Environments
Thursday May 22, 2025 11:40am - 12:00pm CEST
Multimodal research and applications are becoming more commonplace as Virtual Reality (VR) technology integrates different sensory feedback, enabling the recreation of real spaces in an audio-visual context. Within VR experiences, numerous applications rely on the user’s voice as a key element of interaction, including music performances and public speaking applications. Self-perception of our voice plays a crucial role in vocal production. When singing or speaking, our voice interacts with the acoustic properties of the environment, shaping the adjustment of vocal parameters in response to the perceived characteristics of the space.

This technical report presents a real-time auralization pipeline that leverages three-dimensional Spatial Impulse Responses (SIRs) for multimodal research applications in VR requiring first-person vocal interaction. It describes the impulse response creation and rendering workflow, the audio-visual integration, and addresses latency and computational considerations. The system enables users to explore acoustic spaces from various positions and orientations within a predefined area, supporting three and five Degrees of Freedom (3Dof and 5DoF) in audio-visual multimodal perception for both research and creative applications in VR.

The design of this pipeline arises from the limitations of existing audio tools and spatializers, particularly regarding signal latency, and the lack of SIRs captured from a first-person perspective and in multiple adjacent distributions to enable translational rendering. By addressing these gaps, the system enables real-time auralization of self-generated vocal feedback.
Speakers
avatar for Enda Bates

Enda Bates

Assistant Prof., Trinity College Dublin
I'm interested in spatial audio, spatial music, and psychoacoustics. I'm the deputy director of the Music & Media Technologies M.Phil. programme in Trinity College Dublin, and a researcher with the ADAPT centre. At this convention I'm presenting a paper on a Ambisonic Decoder Test... Read More →
Thursday May 22, 2025 11:40am - 12:00pm CEST
C2 ATM Studio Warsaw, Poland

12:00pm CEST

On the Design of Binaural Rendering Library for IAMF Immersive Audio Container
Thursday May 22, 2025 12:00pm - 12:20pm CEST
Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications.
Speakers
avatar for Gavin Kearney

Gavin Kearney

Professor of Audio Engineering, University of York
Gavin Kearney graduated from Dublin Institute of Technology in 2002 with an Honors degree in Electronic Engineering and has since obtained MSc and PhD degrees in Audio Signal Processing from Trinity College Dublin. He joined the University of York as Lecturer in Sound Design in January... Read More →
avatar for Jan Skoglund

Jan Skoglund

Google
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
Thursday May 22, 2025 12:00pm - 12:20pm CEST
C2 ATM Studio Warsaw, Poland
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date - 
  • Acoustic Transducers & Measurements
  • Acoustics
  • Acoustics of large performance or rehearsal spaces
  • Acoustics of smaller rooms
  • Acoustics of smaller rooms Room acoustic solutions and materials
  • Acoustics & Sig. Processing
  • AI
  • AI & Machine Audition
  • Analysis and synthesis of sound
  • Archiving and restoration
  • Audio and music information retrieval
  • Audio Applications
  • Audio coding and compression
  • Audio effects
  • Audio Effects & Signal Processing
  • Audio for mobile and handheld devices
  • Audio for virtual/augmented reality environments
  • Audio formats
  • Audio in Education
  • Audio perception
  • Audio quality
  • Auditory display and sonification
  • Automotive Audio
  • Automotive Audio & Perception
  • Digital broadcasting
  • Electronic dance music
  • Electronic instrument design & applications
  • Evaluation of spatial audio
  • Forensic audio
  • Game Audio
  • Generative AI for speech and audio
  • Hearing Loss Protection and Enhancement
  • High resolution audio
  • Hip-Hop/R&B
  • Impact of room acoustics on immersive audio
  • Instrumentation and measurement
  • Interaction of transducers and the room
  • Interactive sound
  • Listening tests and evaluation
  • Live event and stage audio
  • Loudspeakers and headphones
  • Machine Audition
  • Microphones converters and amplifiers
  • Microphones converters and amplifiers Mixing remixing and mastering
  • Mixing remixing and mastering
  • Multichannel and spatial audio
  • Music and speech signal processing
  • Musical instrument design
  • Networked Internet and remote audio
  • New audio interfaces
  • Perception & Listening Tests
  • Protocols and data formats
  • Psychoacoustics
  • Room acoustics and perception
  • Sound design and reinforcement
  • Sound design/acoustic simulation of immersive audio environments
  • Spatial Audio
  • Spatial audio applications
  • Speech intelligibility
  • Studio recording techniques
  • Transducers & Measurements
  • Wireless and wearable audio