Loading…
Company: Paper Presentation clear filter
arrow_back View All Dates
Friday, May 23
 

9:15am CEST

Generative AI in Audio Education: Process-Centred Teaching for a Product-Centred World
Friday May 23, 2025 9:15am - 9:35am CEST
Artificial intelligence (AI) tools are transforming the way music is being produced. The rate of development is expeditious, and the associated metamorphosis of audio education is abrupt. Higher-level education is largely built around the objectives of knowledge transmission and skills development, evidenced by the emphasis on learning in the cognitive domain in University programmes. But the cohort of skills that music producers will require in five years’ time is unclear, making skills-based curriculum planning challenging. Audio educators require a systematic approach to integrate AI tools in ways that enhance teaching and learning.

This study uses speculative design as the underpinning research methodology. Speculative design employs design to explore and evaluate possible futures, alternative realities, and sociotechnical trends. In this study, the practical tasks in an existing university module are modified by integrating available GAI tools to replace or augment the task design. This tangible artefact is used to critique prevailing assumptions concerning the use of GAI in music production and audio education. The findings suggest that GAI tools will disrupt the existing audio education paradigm. Employing a process-centred approach to teaching and learning may represent a key progression for educators to help navigate these changes.
Speakers
Friday May 23, 2025 9:15am - 9:35am CEST
C1 ATM Studio Warsaw, Poland

9:15am CEST

Investigating Individual, Loudness-Dependent Equalization Preferences in Different Driving Sound Conditions
Friday May 23, 2025 9:15am - 9:35am CEST
In automotive audio playback systems, dynamically increasing driving sounds are typically taken into account by applying a generic, i.e., non-individualized, increase in overall level and low-frequency amplification to compensate increased masking. This study investigated the degree of individuality regarding the preferences of noise-dependent level and equalizer settings. A user study with 18 normal-hearing participants was conducted in which individually preferred level-dependent and frequency-dependent amplification parameters were determined using a music-based procedure in quiet and in nine different driving noise conditions. The comparison of self-adjusted parameters suggested that, on average, participants adjusted higher overall levels and more low-frequency amplification in noise than in quiet. However, preferred self-adjusted levels differedmarkedly between participants for the same listening conditions but were quite similar in a re-test session for each participant, indicating that individual preferences were stable and could be reproducibly measured with the employed personalization scheme. Furthermore, the impact of driving noise on individually preferred settings revealed strong interindividual differences, indicating that listeners can differ widely with respect to their individual optimum of how equalizer and level settings should be dynamically adapted to changes in driving conditions.
Speakers
avatar for Jan Rennies

Jan Rennies

Head of Group Personalized Hearing Systems, Fraunhofer Institute for Digital Media Technology IDMT
I am headin´g a group at Fraunhofer IDMT dedicated to developing new solutions for better communication, hearing, and hearing health in various applications together with partners from industry and academia. I am particularly interested in networking and exploring opportunities for... Read More →
Friday May 23, 2025 9:15am - 9:35am CEST
C2 ATM Studio Warsaw, Poland

9:35am CEST

A Collaborative and Reflective Framework for Redesigning Music Technology Degree Programmes
Friday May 23, 2025 9:35am - 9:55am CEST
Cyclical formal reviews are essential to keep Music and Audio Technology degree programmes current. Whilst clear institutional guidance exists on the requisite documentation to be submitted, there is little guidance concerning the process used to gather the information. To address this issue, a 12 step collaborative and reflective framework was developed to review a degree programme in Music Technology.

This framework employs Walker’s ‘Naturalistic’ process model and design thinking principles to create a dynamic, stakeholder-driven review process. The framework begins with reflective analysis by faculty, helping to define program identity, teaching philosophy, and graduate attributes. Existing curricula are evaluated using Boehm et al.’s (2018) tetrad framework of Music Technology encompassing the sub-disciplines of production, technology, art, and science. Insights from industry professionals, learners, and graduates are gathered through semi-structured interviews, surveys, and focus groups to address skill gaps, learner preferences, and emerging trends. A SWOT analysis further refines the scope and limitations of the redesign process, which culminates in iterative stakeholder consultations to finalise the program’s structure, content, and delivery.

This process-centred approach emphasises adaptability, inclusivity, and relevance, thus ensuring the redesigned program is learner-centred and aligned with future professional and educational demands. By combining reflective practice and collaborative engagement, the framework offers a comprehensive, replicable model for educators redesigning degree programmes in the discipline. This case study contributes to the broader discourse on curriculum design in music and audio degree programmes, demonstrating how interdisciplinary and stakeholder-driven approaches can balance administrative requirements with pedagogical innovation.
Speakers
avatar for Kevin Garland

Kevin Garland

PhD Researcher, TUS
Kevin Garland is a Postgraduate PhD Researcher at the Technological University of the Shannon: Midlands Midwest (TUS), Ireland. His primary research interests include human-computer interaction, user-centered design, and audio technology. Current research lies in user modelling and... Read More →
Friday May 23, 2025 9:35am - 9:55am CEST
C1 ATM Studio Warsaw, Poland

9:35am CEST

Subjective test of loudspeaker virtualization
Friday May 23, 2025 9:35am - 9:55am CEST
In this contribution we present subjective tests of loudspeaker virtualization, a technique enabling the application of specific target behaviors to the physical loudspeaker system. In this work, loudspeaker virtualization is applied to virtualize a closed box car audio subwoofer to replicate the performance of a larger vented enclosure. The tests are designed to determine if any reduction in sound quality is detected by a panel of listeners when a virtualized loudspeaker is used.
Friday May 23, 2025 9:35am - 9:55am CEST
C2 ATM Studio Warsaw, Poland

9:55am CEST

Acoustic Sovereignties: Resounding Indigenous Knowledge in Sound-Based Research
Friday May 23, 2025 9:55am - 10:15am CEST
Acoustic Sovereignties (2024) is a First Nations, anti-colonial spatial audio exhibition held in Naarm (Melbourne), Australia. Through curatorial and compositional practices, Acoustic Sovereignties confronts traditional soundscape and Western experimental sound disciplines by foregrounding marginalised voices.
As this research will show, the foundations of sound-based practices such as Deep Listening and Soundscape Studies consisted of romanticised notions of Indigenous spirituality, in addition to the intentional disregard for First Nations stewardship and kinship with the land and its acoustic composition. Acoustic Sovereignties aims at reclaiming Indigenous representation throughout sound-based disciplines and arts practices by providing a platform for voices, soundscapes and knowledge to be heard.
Speakers
avatar for Hayden Ryan

Hayden Ryan

Graduate Student, RMIT University
My name is Hayden Ryan, I am a First Nations Australian sound scholar and artist, and a 2024 New York University Music Technology Masters graduate. I am currently a Vice Chancellor's Indigenous Pre-Doctoral Fellow at RMIT University, where my PhD focuses on the integration of immersive... Read More →
Friday May 23, 2025 9:55am - 10:15am CEST
C1 ATM Studio Warsaw, Poland

9:55am CEST

Objective measurements for basic sound quality and special audio features in cars
Friday May 23, 2025 9:55am - 10:15am CEST
Car audio systems aim to provide information, entertainment, and acoustic comfort to drivers and passengers in cars. In addition to basic audio functions for broadcasting, playing chimes, warning sound, and music, there are special audio features such as vehicle noise compensation, spatial sound effects, individual sound zone, and active noise control. In this paper, commonly used objective measurement methods for basic sound quality and special features in cars are reviewed and discussed. All objective measurements are proposed to use the 6-unit microphone array specified in the White Paper for In-car Acoustic Measurements released by AES Technical Committee on Automotive Audio in 2023, and the main parameters to be measured are frequency responses and sound pressure levels in the car when the specially designed test signals are played back. General measurement frameworks and procedures for basic sound quality and each feature are presented. The advantages and weakness of using these parameters to characterize the basic sound quality and special features of a car audio system are discussed, and the challenges and future directions are explored.
Speakers
avatar for Xiaojun Qiu

Xiaojun Qiu

Huawei
Dr. Xiaojun Qiu is currently a Chief Scientist in Audio and Acoustics at Huawei. Before he joined Huawei in late 2020, he had been a professor in several universities for nearly 20 years. He is a Fellow of Audio Engineering Society and a Fellow of International Institute of Acoustics... Read More →
Friday May 23, 2025 9:55am - 10:15am CEST
C2 ATM Studio Warsaw, Poland

10:40am CEST

Testing Auditory Illusions in Augmented Reality: Plausibility, Transfer-Plausibility and Authenticity
Friday May 23, 2025 10:40am - 11:00am CEST
Experiments testing sound for augmented reality can involve real and virtual sound sources. Paradigms are either based on rating various acoustic attributes or testing whether a virtual sound source is believed to be real (i.e., evokes an auditory illusion). This study compares four experimental designs indicating such illusions. The first is an ABX task suitable for evaluation under the authenticity paradigm. The second is a Yes/No task, as proposed to evaluate plausibility. The third is a three-alternative-forced-choice (3AFC) task using different source signals for real and virtual, proposed to evaluate transfer-plausibility. Finally, a 2AFC task was tested. The renderings compared in the tests encompassed mismatches between real and virtual room acoustics. Results confirm that authenticity is hard to achieve under nonideal conditions, and ceiling effects occur because differences are always detected. Thus, the other paradigms are better suited for evaluating practical augmented reality audio systems. Detection analysis further shows that the 3AFC transfer-plausibility test is more sensitive than the 2AFC task. Moreover, participants are more sensitive to differences between real and virtual sources in the Yes/No task than theory predicts. This contribution aims to aid in selecting experimental paradigms in future experiments regarding perceptual and technical requirements for sound in augmented reality.
Speakers
avatar for Nils Meyer-Kahlen

Nils Meyer-Kahlen

Aalto University
avatar for Sebastia Vicenc Amengual Gari

Sebastia Vicenc Amengual Gari

Sebastia V. Amengual Gari is currently a research scientist at Reality Labs Research (Meta) working on room acoustics, spatial audio, and auditory perception. He received a Diploma Degree in Telecommunications with a major in Sound and Image in 2014 from the Polytechnic University... Read More →
avatar for Sebastian Schlecht

Sebastian Schlecht

Professor of Practice, Aalto University
Sebastian J. Schlecht is Professor of Practice for Sound in Virtual Reality at the Aalto University, Finland. This position is shared between the Aalto Media Lab and the Aalto Acoustics Lab. His research interests include spatial audio processing with an emphasis on artificial reverberation, synthesis, reproduction, and 6-degrees-of-freedom virtual and mixed reality applications. In particular, his research efforts have been directed towards the intersection of app... Read More →
TL

Tapio Lokki

Department of Signal Processing and Acoustics, Aalto University
Friday May 23, 2025 10:40am - 11:00am CEST
C1 ATM Studio Warsaw, Poland

10:40am CEST

Acoustic Objects: bridging immersive audio creation and distribution systems
Friday May 23, 2025 10:40am - 11:00am CEST
In recent years, professional and consumer audio and music technology has advanced in several areas, including sensory immersion, electronic transmission, content formats, and creation tools. The production and consumption of immersive media experiences increasingly rely on a global network of interconnected frameworks. These experiences, once confined to separate content markets like music, movies, video games, and virtual reality, are now becoming interoperable, ubiquitous, and adaptable to individual preferences, conditions, and languages. This article explores this evolution, focusing on flexible immersive audio creation and reproduction. We examine the development of object-based immersive audio technology and its role in unifying broadcast content with embodied experiences. We introduce the concept of Acoustic Objects, proposing a universal spatial audio scene representation model for creating and distributing versatile, navigable sound in music, multimedia, and virtual or extended reality applications.
Speakers
avatar for Jean-Marc Jot

Jean-Marc Jot

Founder and Principal, Virtuel Works LLC
Spatial audio and music technology expert and innovator. Virtuel Works provides audio technology strategy, IP creation and licensing services to help accelerate the development of audio and music spatial computing technology and interoperability solutions.
avatar for Thibaut Carpentier

Thibaut Carpentier

STMS Lab - IRCAM, SU, CNRS, Ministère de la Culture
Thibaut Carpentier studied acoustics at the École centrale and signal processing at Télécom ParisTech, before joining the CNRS as a research engineer. Since 2009, he has been a member of the Acoustic and Cognitive Spaces team in the STMS Lab (Sciences and Technologies of Music... Read More →
Friday May 23, 2025 10:40am - 11:00am CEST
C2 ATM Studio Warsaw, Poland

11:00am CEST

Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF
Friday May 23, 2025 11:00am - 11:20am CEST
Immersive audio mix presentations involve transmitting and rendering several audio elements simultaneously. This enables next-generation applications, such as personalized playback. Using immersive loudspeaker and headphone MUSHRA tests, we investigate rate vs. quality for a typical mix presentation use case of a foreground stereo element, plus a background Ambisonics scene. For coding, we use Immersive Audio Model and Formats, a recently proposed system for Next-Generation Audio. Excellent quality is achieved at 384 kbit/s, even with reasonable amount of personalization. We also propose a framework for content-aware analysis that can significantly reduce the bitrate even when using underlying legacy audio coding instances.
Speakers
CT

Carlos Tejeda Ocampo

Samsung Research Tijuana
avatar for Jan Skoglund

Jan Skoglund

Google
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
Friday May 23, 2025 11:00am - 11:20am CEST
C1 ATM Studio Warsaw, Poland

11:00am CEST

Immersive Music Production Workflows: An Ethnographic Study of Current Practices
Friday May 23, 2025 11:00am - 11:20am CEST
This study presents an ethnographic analysis of current immersive music production workflows, examining industry trends, tools, and methodologies. Through interviews and participant observations with professionals across various sectors, the research identifies common patterns, effective strategies, and persistent obstacles in immersive audio production. Key findings highlight the ongoing struggle for standardized workflows, the financial and technological barriers faced by independent artists, and the critical role of collaboration between engineers and creatives. Despite the growing adoption of immersive formats, workflows still follow stereo conventions, treating spatialization as an afterthought and complicating the translation of mixes across playback systems. Additionally, the study explores the evolving influence of object-based and bed-based mixing techniques, monitoring inconsistencies across playback systems, and the need for improved accessibility to immersive production education. By synthesizing qualitative insights, this paper contributes to the broader discourse on immersive music production, offering recommendations for future research and industry-wide best practices to ensure the sustainable integration of spatial audio technologies.
Speakers
avatar for Marcela Rada

Marcela Rada

Audio Engineer
Marcela is a talented and accomplished audio engineer that has experience both in the studio and in the classroom teaching university level students the skills of becoming professional audio engineers and music producers. She has worked across music genres recording, editing, mixing... Read More →
RM

Russell Mason

Institute of Sound Recording, University of Surrey
avatar for Enzo De Sena

Enzo De Sena

Senior Lecturer, University of Surrey
Enzo De Sena is a Senior Lecturer at the Institute of Sound Recording at the University of Surrey. He received the M.Sc. degree (cum laude) in Telecommunication engineering from the Università degli Studi di Napoli “Federico II,” Italy, in 2009 and the PhD degree in Electronic Engineering from King’s College London, UK, in 2013. Between 2013 and 2016 he was a postdoctoral researcher at KU Leuven... Read More →
Friday May 23, 2025 11:00am - 11:20am CEST
C2 ATM Studio Warsaw, Poland

11:20am CEST

Evaluation of auditory distance perception in reflective sound field by static and dynamic virtual auditory display
Friday May 23, 2025 11:20am - 11:40am CEST
A psychoacoustic experiment is conducted to evaluate and compared the auditory distance perception in reflected sound field by using static and dynamic VAD. The binaural signals creased by a point source at different distances in a rectangular room are simulated. The contribution of direct sound to binaural signals is simulated by near-field head-related transfer function filters and a gain factor to account for the propagation attenuation of spherical surface wave. The contribution of early reflections up to the second order and later reverberation are respectively simulated by the image source method and Schroeder reverberation algorithm. The results of psychoacoustic experiment indicates that there are still significant differences between the perceived distances created by static VAD and these created by dynamic VAD in the simulated reflected condition, although the differences are not so large as those in the simulated free-field case. The results of dynamic VAD are more consistent with these of real sound source. Therefore, simulating reflections reduces the in-head-localization and thus improves the control of perceived distance in headphone presentation, but static VAD is still less effective in creating different distance perception. Dynamic VAD is still needed in the distance perception experiment for hearing researches even if simulated reflections are included. In practical applications, dynamic VAD is advocated for recreating virtual source at different distance.
Friday May 23, 2025 11:20am - 11:40am CEST
C1 ATM Studio Warsaw, Poland

11:20am CEST

Spherical harmonic beamforming based Ambisonics encoding and upscaling method for smartphone microphone array
Friday May 23, 2025 11:20am - 11:40am CEST
With the rapid development of virtual reality (VR) and augmented reality (AR), spatial audio recording and reproduction have gained increasing research interest. Higher Order Ambisonics (HOA) stands out for its adaptability to various playback devices and its ability to integrate head orientation. However, current HOA recordings often rely on bulky spherical microphone arrays (SMA), and portable devices like smartphones are limited by array configuration and number of microphones. We propose a method for HOA encoding using a smartphone microphone array (SPMA). By designing beamformers for each order of spherical harmonic functions based on the array manifold, the method enables HOA encoding and up-scaling. Validation on a real SPMA and its simulated free-field counterpart in noisy and reverberant conditions showed that the method successfully encodes and up-scales HOA up to the fourth order with just four irregularly arranged microphones.
Friday May 23, 2025 11:20am - 11:40am CEST
C2 ATM Studio Warsaw, Poland

11:40am CEST

Subjective evaluation of immersive microphone arrays for drums
Friday May 23, 2025 11:40am - 12:00pm CEST
Through a practice-oriented study, various coincident, near-coincident, and non-coincident immersive microphone arrays were compared during drum recordings for different contemporary popular music genres. In a preliminary study, the OCT-3D, PCMA-3D, 2L-Cube, Hamasaki Square, IRT Cross, Ambisonics A-Format, and native B-Format were informally compared, revealing that the differences between non-coincident systems were much smaller than the differences between coincident and non-coincident systems. This led to a reduction in microphone systems for the final drum recordings. Four microphone techniques were selected: OCT-3D, native B-Format, Ambisonics A-Format, and IRT Cross. These were compared within the context of two different songs – a calm pop track and an energetic rock song – where the drums were respectively recorded in a dry drum booth and a large studio hall. Through a listening test with a small sample group, it was determined which microphone technique was best suited for each song. Participants were also asked to identify the general favorite, without musical context, as well as how the spatiality, timbre, and height were perceived. It was concluded that the choice of immersive microphone technique depends on the musical context. Conclusions from more objective studies focus primarily on accurate localization, with non-coincident systems consistently performing the best. However, these studies do not take into account the musical context, where accurate localization does not always take precedence. Furthermore, it was noted that height perception in music is not solely created by speakers in the height range. The comparative drum recordings are published through https://www.immersive.pxl.be/immersive- microphone-techniques-for-drums/.
Speakers
avatar for Arthur Moelants

Arthur Moelants

Researcher, PXL Music Research
avatar for Steven Maes

Steven Maes

Founder of Motormusic Studios, Researcher & Lecturer at PXL Music, PXL Music
Friday May 23, 2025 11:40am - 12:00pm CEST
C1 ATM Studio Warsaw, Poland

1:30pm CEST

Discrimination of vowel-like timbre quality: A case of categorical perception?
Friday May 23, 2025 1:30pm - 1:50pm CEST
This study investigated whether categorical perception—a phenomenon observed in speech perception—extends to the discrimination of vowel-like timbre qualities. Categorical perception occurs when continuous acoustic variations are perceived as distinct categories, leading to better discrimination near category boundaries than within a category. To test this, discrimination thresholds for the center frequency of a one-third-octave band formant introduced into the spectrum of a pink noise burst were measured in five subjects using an adaptive psychophysical procedure. Thresholds were assessed at distinctive formant frequencies of selected Polish vowels and at boundaries between adjacent vowel categories along the formant-frequency continuum. Results showed no reduction in discrimination thresholds at category boundaries, suggesting an absence of categorical perception for vowel-like timbre. One possible explanation for this finding lies in the listening mode—a concept from ecological auditory research—describing cognitive strategies in auditory tasks. The design of both the stimuli and the experimental procedure likely encouraged an acousmatic listening mode, which focuses solely on the sensory characteristics of sound, without reference to its source or meaning. This may have suppressed cues typically used in the categorical perception of speech sounds, which are associated with the communication listening mode. These findings highlight the importance of considering listening mode in future research on categorical perception of timbre and suggest that vowel-like timbre discrimination may involve perceptual mechanisms distinct from those used in speech sound discrimination.
Friday May 23, 2025 1:30pm - 1:50pm CEST
C1 ATM Studio Warsaw, Poland

1:30pm CEST

On the effect of photogrammetric reconstruction and pinna deformation methods on individual head-related transfer functions
Friday May 23, 2025 1:30pm - 1:50pm CEST
Individual head-related transfer functions (HRTFs) are instrumental in rendering plausible spatial audio playback over headphones as well as in understanding auditory perception. Nowadays, the numerical calculation of individual HRTFs is achievable even without high-performance computers. However, the main obstacle is the acquisition of a mesh of the pinnae with a submillimeter accuracy. One approach to this problem is the photogrammetric reconstruction (PR), which estimates a 3D shape from 2D input, e.g., photos. Albeit easy to use, this approach comes with a trade-off in the resulting mesh quality, which subsequently has a substantial impact on the HRTF's quality. In this study, we investigated the effect of PR on HRTF quality as compared to HRTFs calculated from a reference mesh acquired with a high-quality structured-light scanner. Additionally, we applied two pinna deformation methods, which registered a non-individual high-quality pinna to the individual low-quality PR pinna by means of geometric distances. We investigated the potential of these methods to improve the quality of the PR-based pinna meshes. Our evaluation involved the geometrical, acoustical, and psychoacoustical domains including a sound-localization experiment with 9 participants. Our results show that neither PR nor PR-improvement methods were able to provide individual HRTFs of sufficient quality, indicating that without extensive pre- or post-processing, PR provides too little individual detail in the HRTF-relevant pinna regions.
Speakers
avatar for Katharina Pollack

Katharina Pollack

PhD student in spatial audio, Acoustics Research Institute Vienna & Imperial College London
Katharina Pollack studied electrical engineering audio engineering in Graz, both at the Technical University and the University of Music and Performing Arts in Graz and is doing her PhD at the Acoustics Research Institute in Vienna in the field of spatial hearing. Her main research... Read More →
avatar for Piotr Majdak

Piotr Majdak

Austrian Academy of Sciences
Friday May 23, 2025 1:30pm - 1:50pm CEST
C2 ATM Studio Warsaw, Poland

1:50pm CEST

Speech intelligibility in noise: A comparative study of musicians, audio-engineers, and non-musicians
Friday May 23, 2025 1:50pm - 2:10pm CEST
Published studies indicate that musicians outperform non-musicians in a variety of non-musical auditory tasks, a phenomenon known as the “musicians’ hearing advantage effect.” One widely reported benefit is enhanced speech-in-noise (SIN) recognition. It was observed that musicians’ speech-in-noise (SIN) recognition thresholds (SRTs) are lower than those of non-musicians, though findings—mainly from English-language studies—are mixed; some confirm these advantage, while others do not. This study extends SRT measurements to Polish, a language with distinct phonetic characteristics. Participants completed a Polish speech intelligibility test, reconstructing sentences masked by multitalker babble noise by selecting words from a list displayed on a computer screen. Speech levels remained constant while masking noise was adjusted adaptively: increasing after each correct response and decreasing after each error. Three groups were tested: musicians, musically trained audio engineers, and non-musicians. Results showed that musicians and audio engineers had SRTs 2 and 2.7 dB lower than non-musicians, respectively. Although audio engineers exhibited slightly lower SRTs than musicians, the difference was minimal, with statistical significance just above the conventional 5% threshold. Thus, under these conditions, no clear advantage of audio engineers over musicians in SIN performance was observed.
Friday May 23, 2025 1:50pm - 2:10pm CEST
C1 ATM Studio Warsaw, Poland

1:50pm CEST

Mesh2PPM - Automatic Parametrization of the BezierPPM: Entire Pinna
Friday May 23, 2025 1:50pm - 2:10pm CEST
An individual human pinna geometry can be used to achieve plausible personalized audio reproduction. However, an accurate acquisition of the pinna geometry typically requires the use of specialized equipment and often involves time-consuming post-processing to remove potential artifacts. To obtain an artifact-free but individualized mesh, a parametric pinna model based on cubic Bézier curves (BezierPPM) can be used to represent an individual pinna. However, the parameters need to be manually tuned to the acquired listener’s geometry. For increased scalability, we propose Mesh2PPM, a framework for an automatic estimation of BezierPPM parameters from an individual pinna. Mesh2PPM relies on a deep neural network (DNN) that was trained on a dataset of synthetic multi-view images rendered from BezierPPM instances. For the evaluation, unseen BezierPPM instances were presented to Mesh2PPM which inferred the BezierPPM parameters. We subsequently assessed the geometric errors between the meshes obtained from the BezierPPM parametrized with the inferred parameters and the actual pinna meshes. We investigated the effects of the camera-grid type, jittered camera positions, and additional depth information in images on the estimation quality. While depth information had no effect, the camera-grid type and the jittered camera positions both had effects. A camera grid of 3×3 provided the best estimation quality, yielding Pompeiu-Hausdorff distances of 2.05 ± 0.4 mm and 1.4 ± 0.3 mm with and without jittered camera
positions, respectively, and root-mean-square (RMS) distances of 0.92 ± 0.12 mm and 0.52 ± 0.07 mm. These results motivate further improvements of the proposed framework to be ultimately applicable for an automatic estimation of pinna geometries obtained from actual listeners.
Speakers
Friday May 23, 2025 1:50pm - 2:10pm CEST
C2 ATM Studio Warsaw, Poland

2:10pm CEST

Exploring stimulus spacing bias in MUSHRA listening tests using labeled and unlabeled graphic scales
Friday May 23, 2025 2:10pm - 2:30pm CEST
The multi-stimulus test with hidden reference and anchor (MUSHRA) is a prevalent method for the subjective audio quality evaluation. Despite its popularity, the technique is not immune to biases. Empirical evidence indicates that the presence of labels (quality descriptors) equidistantly distributed along the rating scale may be the cause of its non-linear warping; however, other factors could evoke even stronger non-linear effects. This study aims to investigate the hypothesis that stimulus spacing bias may induce a greater magnitude of non-linear warping of the quality scale compared to that caused by the presence of labels. To this end, a group of more than 120 naïve listeners participated in MUSHRA-compliant listening tests using labeled and unlabeled graphic scales. The audio excerpts, representing two highly skewed distributions of quality levels, were reproduced over headphones in an acoustically treated room. The findings of this study verify the postulated hypothesis and shed new light on the mechanisms biasing results of the MUSHRA-conformant listening tests.
Friday May 23, 2025 2:10pm - 2:30pm CEST
C1 ATM Studio Warsaw, Poland

2:10pm CEST

Towards a Headphone Target Curve for Spatial Audio
Friday May 23, 2025 2:10pm - 2:30pm CEST
In order to reproduce audio over headphones as in-
tended, it is essential to have well-defined and con-
sistent references of how headphones should sound.
With the aim of stereo reproduction in mind, the field
has established a de-facto reference target curve called
the Harman Target Curve to which headphone transfer
functions are commonly compared. This contribution
questions if the same target curve is suitable when used
for the reproduction of spatial audio. First, the ori-
gins the Harman Curve are revisited; it is motivated by
the frequency response of loudspeaker playback in a
specific listening room. The necessary measurement
procedures are described in detail. Then, the paper
discusses the applicability of existing targets to spa-
tial audio. Therein, it is possible to embed convincing
spatial room information directly into the production,
thereby calling into question the motivation for incor-
porating a listening room in the headphone target. The
paper concludes with a listening experiment that com-
pares the preference of different target curves for both
spatial audio and stereo
Speakers
AM

Alexander Mülleder

Graz University of Technology
avatar for Nils Meyer-Kahlen

Nils Meyer-Kahlen

Aalto University
Friday May 23, 2025 2:10pm - 2:30pm CEST
C2 ATM Studio Warsaw, Poland

2:30pm CEST

Investigating Listeners’ Emotional and Physiological Responses to Varying Apparent Width and Horizontal Position of a Single Sound Source
Friday May 23, 2025 2:30pm - 2:50pm CEST
This research aims to explore the impact of variations in apparent sound source width and position on emotional and physiological responses among listeners, with a particular focus on the domain of virtual reality applications. While sound is recognized as a potent elicitor of strong emotions, the specific role of spatial characteristics, such as apparent sound source width, has not been systematically analyzed. The authors’ previous study has indicated that the spatial distribution of sound can alter perceptions of scariness. In contrast, the current study explores whether adjustments in apparent sound source width can significantly affect emotional valence and arousal, as well as human physiological metrics. The objective of this study was to investigate the impact of a single sound source width and its horizontal position on emotional engagement, thereby providing valuable insights for advancements in immersive audio experiences. Our experiments involved conducting listening tests in a spatial sound laboratory, utilizing a circular setup of sixteen loudspeakers to present a range of audio stimuli drawn from five selected recordings. The stimuli were manipulated based on two key parameters: the apparent sound source width and the spatial positioning of the sound source (front, back, left, or right). Participants assessed their emotional reactions using the Self-Assessment Manikin (SAM) pictogram method. Physiological data, including electroencephalogram, blood volume pressure, and electrodermal activity was collected in real-time via wearable sensors consisting of an EEG headset and a finger-attached device.
Friday May 23, 2025 2:30pm - 2:50pm CEST
C1 ATM Studio Warsaw, Poland

2:30pm CEST

Sound Source Directivity Estimation in Spherical Fourier Domain from Sparse Measurements
Friday May 23, 2025 2:30pm - 2:50pm CEST
In recent years, applications such as virtual reality (VR) systems and room acoustics simulations have brought the modeling of sound source directivity into focus. An accurate simulation of directional responses of sound sources is essential in immersive audio applications.

Real sound sources have directional properties that are different from simple sources such as monopoles, which are sources frequently used for modeling more complex acoustic fields. For instance, the sound level of human speech as a sound source varies considerably depending on where the sound is recorded with respect to the talker’s head. The same is true for loudspeakers, which are considered linear and time-independent sources. When the sound is recorded behind the speaker, it is normal to observe differences of up to 20 dB SPL at some frequencies. The directional characteristics of sound sources become particularly pronounced at high frequencies. The propagation of real sound sources, such as human voices or musical instruments, differs from simple source models like monopoles, dipoles, and quadrupoles due to their physical structures.

The common approach to measuring directivity patterns of sound sources involves surrounding a sound source in an anechoic chamber with a high number of pressure microphones on a spherical grid and registering the sound power at these positions. Apart from the prohibitive hardware requirements, such measurement setups are mostly impractical and costly. Audio system manufacturers have developed various methods for measuring sound source directionality over the years. These methods are generally of high technical complexity.

This article proposes a new, reduced-complexity directivity measurement approach based on the spherical harmonic decomposition of the sound field. The method estimates the directional characteristics of sound sources using fewer measurement points with spherical microphone arrays. The spherical harmonic transform allows for the calculation of directivity using data collected from spherical microphone arrays instead of pressure sensors. The proposed method uses both the pressure component and spatial derivatives of the sound field and successfully determines directivity with sparse measurements.

An estimation model based on the spherical Fourier transform was developed, measurements were carried out to test this model, and preliminary results obtained from the estimation model are presented. Experiments conducted at the METU Spatial Audio Research Laboratory demonstrated the effectiveness of the proposed method. The directivity characteristics of Genelec 6010A loudspeaker are measured using eight 3rd-order spherical microphone arrays. The directivity functions obtained were highly consistent with the data provided by the loudspeaker manufacturer. The results, especially in low and mid-frequency bands, indicate the utility of the proposed method.
Friday May 23, 2025 2:30pm - 2:50pm CEST
C2 ATM Studio Warsaw, Poland

2:50pm CEST

A study on reverberation in a virtual acoustic setting using the Lexicon 960L Reverb Processor
Friday May 23, 2025 2:50pm - 3:10pm CEST
This paper describes ongoing research on integrating algorithmic reverberation tools designed for audio post-production into virtual acoustics, focusing on using Impulse Responses (IRs) captured from the legendary Lexicon 960L hardware reverberation unit. While previous research from the McGill University Virtual Acoustics Technology (VAT) Lab has utilized room impulse responses (RIRs) captured from various performance halls to create active acoustic environments in the recording studio, this study analyzes the perceived differences between the two listening environments and the effect of the VATLab speakers and effect of room acoustics on IRs captured from 5.0 multichannel reverb presets. Three of these multichannel IRs have been chosen to simulate a Lexicon 960L “environment” in a physical space.

Objective measurements in McGill University’s Immersive Media Laboratory (IMLAB) Control Room and in VATLab following the ISO 3382 standard measure the effect of the physical room and the omnidirectional dodecahedral speakers used for auralization. Through a subjective pilot study, subjective analysis investigates the perceived differences between the Lexicon IRs in VATLab and a control condition, the IMLAB control room. The results of an attribute rating test on perceived immersion, soundfield continuity, tone color, and overall listening experience between the two spaces helps us better understand how reverberation algorithms designed for multichannel mixing/post-production translate to a virtual acoustics system.
In conclusion, we discuss the perceptual differences between the IMLAB Control Room and VATLab and results of objective measurements.
Speakers
AA

Aybar Aydin

PhD Candidate, McGill University
avatar for Kathleen Zhang

Kathleen Zhang

McGill University
avatar for Richard King

Richard King

Professor, McGill University
Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →
Friday May 23, 2025 2:50pm - 3:10pm CEST
C1 ATM Studio Warsaw, Poland

2:50pm CEST

Perceptual evaluation of professional point and line sources for immersive audio applications
Friday May 23, 2025 2:50pm - 3:10pm CEST
Immersive sound reinforcement aims to create a balanced perception of sounds arriving from different directions, establishing an impression of envelopment over the audience area. Current perceptual research shows that coverage designs featuring nearly constant decay (0dB per distance doubling) preserve the level balance among audio objects in the mix. In contrast, a -3dB decay supports a more uniform sensation of envelopment, especially for off-center listening positions. For practical reasons, point-source loudspeakers remain widely used for immersive audio playback in mid-sized venues. However, point-source loudspeakers inherently decay by -6dB per distance doubling, and using them can conflict with the design goals outlined above. In this paper, we investigate the perceived differences between point-source and line-source setups using eight surrounding loudspeakers side-by-side covering a 10m x 7m audience area. The perceptual qualities of object level balance, spatial definition, and envelopment were compared in a MUSHRA listening experiment, and acoustic measurements were carried out to capture room impulse responses and binaural room impulse responses (BRIRs) of the experimental setup. The BRIRs were used to check whether the results of the listening experiment were reproducible on headphones. Both the loudspeaker and headphone-based experiments delivered highly correlated results. Also, regression models devised based on the acoustic measurements are highly correlated to the perceptual results. The results confirm that elevated line sources, exhibiting a practically realizable decay of -2dB per distance doubling, help preserve object-level balance, increase spatial definition, and provide a uniform envelopment experience throughout the audience area compared to point-source loudspeakers.
Speakers
avatar for Franz Zotter

Franz Zotter

University of Music and Performing Arts Graz
Franz Zotter received an M.Sc. degree in electrical and audio engineering from the University of Technology (TUG) in 2004, a Ph.D. degree in 2009 and a venia docendi in 2023 from the University of Music and Performing Arts (KUG) in Graz, Austria. He joined the Institute of Electronic... Read More →
avatar for Philip Coleman

Philip Coleman

Senior Immersive Audio Research Engineer, L-Acoustics
I'm a research engineer in the L-ISA immersive audio team at L-Acoustics, based in Highgate, London. I'm working on the next generation of active acoustics and object-based spatial audio reproduction, to deliver the best possible shared experiences.Before joining L-Acoustics in September... Read More →
Friday May 23, 2025 2:50pm - 3:10pm CEST
C2 ATM Studio Warsaw, Poland

3:10pm CEST

Detection of spectral component asynchrony: Applying psychoacoustic research to transient phenomena in music
Friday May 23, 2025 3:10pm - 3:30pm CEST
Numerous studies highlight the role of transient behavior in musical sounds and its impact on sound identification. This study compares these findings with established psychoacoustic measurements of detection thresholds for asynchrony in onset and offset transients, obtained using synthesized stimuli that allowed precise control of stimulus parameters. Results indicated that onset asynchrony can be detected at thresholds as low as 1 ms—even half a cycle of the component frequency. In contrast, offset asynchrony detection was found to be less precise, with thresholds ranging from 5 to 10 ms. Sensitivity improves when multiple harmonics are asynchronous. Additionally, component phase significantly influences onset asynchrony detection: at 1000 Hz and above, phase shifts raise thresholds from below 1 ms to around 50 ms, while having little effect on offset detection. Although these findings were based on controlled artificial stimuli, they can provide valuable insight into asynchrony in natural musical sounds. In many cases, detection thresholds are well below the variations observed in music, yet under certain conditions and frequencies, some temporal variations may become not perceptible.
Speakers
Friday May 23, 2025 3:10pm - 3:30pm CEST
C1 ATM Studio Warsaw, Poland

3:45pm CEST

A Curvilinear Transfer Function for Wide Dynamic Range Compression With Expansion
Friday May 23, 2025 3:45pm - 4:05pm CEST
Wide Dynamic Range Compression in hearing aids is becoming increasingly more complex as the number of channels and adjustable parameters grow. At the same time, there is growing demand for customization and user self-adjustment of hearing aids, necessitating a balance between complexity and user accessibility. Compression in hearing aids is governed by the input-output transfer function, which relates input magnitude to output magnitude, and is typically defined as a combination of linear piecewise segments resembling logarithmic behavior. This work presents an alternative to the conventional compression transfer function that consolidates multiple compression parameters and revisits expansion in hearing aids. The
curvilinear transfer function is a continuous curve with logarithm-like behavior, governed by two parameters—gain and compression ratio. Experimental results show that curvilinear compression reduces the amplification of low-level noise, improves signal-to-noise ratio by up to 1.0 dB, improves sound quality as measured by the Hearing Aids Speech Quality Index by up to 6.7%, and provides comparable intelligibility as measured by the Hearing Aids Speech Perception Index, with simplified parameterization compared to conventional compression.
The consolidated curvilinear transfer function is highly applicable to over-the-counter hearing aids and offers more capabilities for customization than current prominent over-the-counter and self-adjusted hearing aids.
Friday May 23, 2025 3:45pm - 4:05pm CEST
C2 ATM Studio Warsaw, Poland

4:05pm CEST

Tiresias - An Open-Source Hearing Aid Development Board
Friday May 23, 2025 4:05pm - 4:25pm CEST
Hearing loss is a global public health issue due to its high prevalence and negative impact on various aspects of one’s life, including well being and cognition. Despite their crucial role in auditory rehabilitation, hearing aids remain inaccessible to many due to their high costs, particularly in low- and middle-income countries. Existing open-source solutions often rely on high-power, bulky platforms rather than compact, low-power wearables suited for real-world applications. This work introduces Tiresias, an open-source hearing aid development board designed for real-time audio processing using low-cost electronics. Integrating key hearing aid functionalities into a compact six-layer printed circuit board (PCB), Tiresias features multichannel compression, digital filtering, beamforming, Bluetooth connectivity, and physiological data monitoring, fostering modularity and accessibility through publicly available hardware and firmware resources based on the Nordic nRF Connect and Zephyr real-time operating system (RTOS). By addressing technological and accessibility challenges, this work advances open-source hearing aid development, enabling research in hearing technologies, while also supporting future refinements and real-world validation.
Friday May 23, 2025 4:05pm - 4:25pm CEST
C2 ATM Studio Warsaw, Poland
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date - 
  • Acoustic Transducers & Measurements
  • Acoustics
  • Acoustics of large performance or rehearsal spaces
  • Acoustics of smaller rooms
  • Acoustics of smaller rooms Room acoustic solutions and materials
  • Acoustics & Sig. Processing
  • AI
  • AI & Machine Audition
  • Analysis and synthesis of sound
  • Archiving and restoration
  • Audio and music information retrieval
  • Audio Applications
  • Audio coding and compression
  • Audio effects
  • Audio Effects & Signal Processing
  • Audio for mobile and handheld devices
  • Audio for virtual/augmented reality environments
  • Audio formats
  • Audio in Education
  • Audio perception
  • Audio quality
  • Auditory display and sonification
  • Automotive Audio
  • Automotive Audio & Perception
  • Digital broadcasting
  • Electronic dance music
  • Electronic instrument design & applications
  • Evaluation of spatial audio
  • Forensic audio
  • Game Audio
  • Generative AI for speech and audio
  • Hearing Loss Protection and Enhancement
  • High resolution audio
  • Hip-Hop/R&B
  • Impact of room acoustics on immersive audio
  • Instrumentation and measurement
  • Interaction of transducers and the room
  • Interactive sound
  • Listening tests and evaluation
  • Live event and stage audio
  • Loudspeakers and headphones
  • Machine Audition
  • Microphones converters and amplifiers
  • Microphones converters and amplifiers Mixing remixing and mastering
  • Mixing remixing and mastering
  • Multichannel and spatial audio
  • Music and speech signal processing
  • Musical instrument design
  • Networked Internet and remote audio
  • New audio interfaces
  • Perception & Listening Tests
  • Protocols and data formats
  • Psychoacoustics
  • Room acoustics and perception
  • Sound design and reinforcement
  • Sound design/acoustic simulation of immersive audio environments
  • Spatial Audio
  • Spatial audio applications
  • Speech intelligibility
  • Studio recording techniques
  • Transducers & Measurements
  • Wireless and wearable audio