The evolution of musical instruments has been deeply influenced by advancements in audio equipment, allowing for the creation of musical instruments that bridge the gap between tradition and modern innovations. This paper highlights the integration of modern technologies such as digital signal processing (DSP), artificial intelligence (AI) and advanced materials into musical instruments to enhance functionality, sound quality and musicians experience at all level by examining the historical progress, design principles and modern innovations.
Major areas of focus include the roles of electronic components such as the pickups, sensors and wireless interfaces in improving the functionality of modern musical instruments, as well high-performance materials on durability and sustainability. The case study of digital pianos and the talking drum will provide practical insights into how these innovations are being implemented alongside the contrast. The paper further addresses challenges such as maintaining cultural authenticity of traditional instruments while integrating modern technology, issue of latency, accessibility for diverse users globally and sustainability concerns in manufacturing.
This paper presents a case study on the auralization of the lost wooden synagogue in Wołpa, digitally reconstructed using a Heritage Building Information Modelling (HBIM) framework for virtual reality (VR) presentation. The study explores how acoustic simulation can aid in the preservation of intangible heritage, focusing on the synagogue’s unique acoustics. Using historical documentation, the synagogue was reconstructed with accurate geometric and material properties, and its acoustics were analyzed through high-fidelity ray-tracing simulations. A key objective of this project is to recreate the Shema Israel ritual, incorporating a historical recording of the rabbi’s prayers. To enable interactive exploration, real-time auralization techniques were optimized to balance computational efficiency and perceptual authenticity, aiming to overcome the trade-offs between simplified VR audio models and physically accurate simulations. This research underscores the transformative potential of immersive technologies in reviving lost heritage, offering a scalable, multi-sensory approach to preserving sacred soundscapes and ritual experiences.
The article explores the innovative concept of interactive music, where both creators and listeners can actively shape the structure and sound of a musical piece in real-time. Traditionally, music is passively consumed, but interactivity introduces a new dimension, allowing for creative participation and raising questions about authorship and the listener's role. The project "Sound Permutation: A Real-Time Interactive Musical Experiment" aims to create a unique audio-visual experience by enabling listeners to choose performers for a chamber music piece in semi-real-time. Two well-known compositions, Edward Elgar's "Salut d’Amour" and Camille Saint-Saëns' "Le Cygne," were recorded by three cellists and three pianists in all possible combinations. This setup allows listeners to seamlessly switch between performers' parts, offering a novel musical experience that highlights the impact of individual musicians on the perception of the piece.
The project focuses on chamber music, particularly the piano-cello duet, and utilizes advanced recording technology to ensure high-quality audio and video. The interactive system, developed using JavaScript allows for smooth video streaming and performer switching. The user interface is designed to be intuitive, featuring options for selecting performers and camera views. The system's optimization ensures minimal disruption during transitions, providing a cohesive musical experience. This project represents a significant step towards making interactive music more accessible, showcasing the potential of technology in shaping new forms of artistic engagement and participation.
In the field of digital audio signal processing (DSP) systems, the choice between standard and proprietary digital audio networks (DANs) can significantly impact both functionality and performance. This abstract aims to explore the benefits, tradeoffs, and economic implications of these two approaches, providing a comprehensive comparison to aid in decision-making processes for audio professionals and system designers. The abstract emphasizes key benefits of A2B, AOIP and older proprietary currently adopted.
Conclusion The choice between standard and proprietary digital audio networks in audio DSP systems involves a careful consideration of benefits, tradeoffs, and economic implications. Standards-based systems provide interoperability and cost-effectiveness, while proprietary solutions offer optimized performance and innovative features. Understanding these factors can guide audio professionals and system designers in making informed decisions that align with their specific needs and long-term goals.
Electrical and Mechanical Engineer Bachelor Degree from Universidad Panamericana in Mexico City. Master in Science in Music Engineering from University of Miami.EMBA from Boston UniversityWorked at Analog Devices developing DSP Software and Algorithms ( SigmaStudio ) for 17 years... Read More →
Thursday May 22, 2025 10:00am - 12:00pm CEST Hall FATM Studio Warsaw, Poland
This paper presents an ongoing project that aims to document the urban soundscapes of the Polish city of Białystok. It describes the progress made so far, including the selection of sonic landmarks, the process of acquiring the audio recordings, and the design of the unique graphic user interface featuring original drawings. Furthermore, it elaborates on the ongoing efforts to extend the project beyond the scope of a typical urban soundscape repository. In the present phase of the project, in addition to monophonic recordings, audio excerpts are acquired in binaural and Ambisonic sound formats, providing listeners with an immersive experience. Moreover, state-of-the-art machine-learning algorithms are applied to analyze gathered audio recordings in terms of their content and spatial characteristics, ultimately providing prospective users of the sound map with some form of automatic audio tagging functionality.
This study evaluates the effectiveness of artificial reverberation algorithms that are used to create simulated acoustic environments by comparing them to the acoustic response of the real spaces. A mixed-methods approach, integrating objective and subjective measures, was employed to assess both the accuracy and perceptual quality of simulated acoustics. Real-world spaces, within a research project…, were selected for their varying sizes, functions, and acoustical properties. Objective acoustic measurements—such as Room Impulse Response (RIR), and extracted features i.e. Reverberation Time (RT60), Early Decay Time (EDT), Clarity index (C50, C80), and Definition (D50)—were conducted to establish baseline profiles. Simulated environments were created to replicate real-world conditions, incorporating source-receiver configurations, room geometries, and/or material properties. Objective metrics were extracted from these simulations for comparison with real-world data. After applying the artificial reverberation algorithm, the same objective measurements were re-recorded to assess its impact. Subjective listening tests were also conducted, with a diverse panel of listeners rating the perceived clarity, intelligibility, comfort, and overall sound quality of both real and simulated spaces, using a double-blind procedure to mitigate bias. Statistical analyses, including paired t-tests and correlation analysis, were performed to assess the relationship between objective and subjective evaluations. This approach provides a comprehensive framework for evaluating the algorithm’s ability to enhance simulated acoustics and align them with real-world environments.
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
Automotive audio systems operate in highly reflective and acoustically challenging environments that differ significantly from optimized listening spaces such as concert halls or home theaters. The compact and enclosed nature of car cabins, combined with the presence of reflective surfaces—including the dashboard, windshield, and window, creates strong early reflections that interfere with the direct sound from loudspeakers. These reflections result in coherent interference, comb filtering, and position-dependent variations in frequency response, leading to inconsistent tonal balance, reduced speech intelligibility, and compromised stereo imaging and spatial localization. Traditional approaches, such as equalization and time alignment, attempt to compensate for these acoustic artifacts but do not effectively address coherence issues arising from coherent early reflections. To mitigate these challenges, this study explores Dynamic Diffuse Signal Processing (DiSP) as an alternative solution for reducing early reflection coherence within automotive environments. DiSP is a convolution based signal processing technique that when implemented effectively, decorrelates coherent signals them while remaining perceptually identical. While this method has been successfully studied in sound reinforcement and multi-speaker environments, its application in automotive audio has not been extensively studied. This research investigates the effectiveness of DiSP by analyzing pre- and post-DiSP impulse responses and frequency response variations at multiple listening positions. We assess its effectiveness in mitigating phase interference, reducing comb filtering. Experimental results indicate that DiSP significantly improves the uniformity of sound distribution, reducing spectral deviations across seating positions and minimizing unwanted artifacts caused by early reflections. These findings suggest that DiSP can serve as a powerful tool for optimizing in-car audio reproduction, offering a scalable and computationally efficient approach to improving listener experience in modern automotive sound systems.
I am now working as an audio engineer with my research into 6 Degrees-of-Freedom (6DoF) audio for Virtual Reality (VR); this includes hybrid acoustic modelling methods for real-time calculation. I am currently looking at perceptual differences in different acoustic rendering methods... Read More →
Room acoustics optimisation in live sound environments using signal processing techniques has captivated the minds of audio enthusiasts and researchers alike for over half a century. From analogue filters in the 1950s, to modern research efforts such as room impulse response equalisation and adaptive sound field control, this subject has exploded to life. Controlling the sound field in a static acoustic space is complex due to the high number of system variables, such as reflections, speaker crosstalk, equipment-induced coloration, room modes, reverberation, diffraction and listener positioning. These challenges are further amplified by dynamic variables such as audience presence, environmental conditions and room occupancy changes, which continuously and unpredictably reshape the sound field. A primary objective of live sound reinforcement is to deliver uniform sound quality across the audience area. This is most critical at audience ear level, where tonal balance, clarity, and spatial imaging are most affected by variations in the sound field. While placing microphones at audience ear level positions could enable real-time monitoring, large-scale deployment is impractical due to audience interference. This research will explore the feasibility of an adaptive virtual microphone-based approach to room acoustics optimisation. By strategically placing microphone arrays and leveraging virtual microphone technology, the system estimates the sound field dynamically at audience ear level without requiring physical microphones. By continuously repositioning focal points across listening zones, a small number of arrays could effectively monitor large audience areas. If accurate estimations can be achieved, real-time sound field control becomes more manageable and effective.
Professor of Audio Engineering, University of York
Gavin Kearney graduated from Dublin Institute of Technology in 2002 with an Honors degree in Electronic Engineering and has since obtained MSc and PhD degrees in Audio Signal Processing from Trinity College Dublin. He joined the University of York as Lecturer in Sound Design in January... Read More →
The occurrence of eigenmodes is one of the fundamental phenomena in the acoustics of small rooms. The modes formation results in an uneven distribution of the sound pressure level in the room. To determine the resonance frequencies and their distributions, numerical methods, analytical methods or experimental studies are used. For the purpose of this paper, an experimental study was carried out in a small room. The study analysed the results of measuring the sound pressure level distributions in the room, with a special focus on the frequency range 20 Hz - 32 Hz, below the first modal frequency in the room. The measurement were conducted in the rectangular grid 9x9 microphones, which resulted in 0.5 m microphones grid resolution. The influence of evanescent modes on the total sound field was investigated. The research takes into account several sound source locations. On the basis of the acoustic measurement carried out, frequency response curves were also plotted. This paper presents a few methods for analysing these curves based on standard deviation, the linear least squares method, coefficient of determination R^2 and root mean squared error (RMSE). The results obtained made it possible to determine the best position of the acoustic source in the room under study. The effect of evanescent modes on the total sound field was also observed.
Mono compatibility is a fundamental challenge in audio production, ensuring that stereo mixes retain clarity, balance, and spectral integrity when summed to mono. Traditional stereo widening techniques often introduce phase shifts, comb filtering, and excessive decorrelation, causing perceptual loss of critical mix elements in mono playback. Diffuse Signal Processing (DiSP) is introduced as a convolution-based method that improves mono compatibility while maintaining stereo width.
This study investigates the application of DiSP to the left and right channels of a stereo mix, leveraging MATLAB-synthesized TDI responses to introduce spectrally balanced, non-destructive acoustic energy diffusion. TDI convolution is then applied to both the left and right channels of the final stereo mix.
A dataset of stereo mixes from four genres (electronic, heavy metal, orchestral, and pop/rock) was analyzed. The study evaluated phase correlation, mono-summed frequency response deviation and amount of comb filtering to quantify improvements in mono summation. Spectral plots and wavelet transforms provided objective analysis. Results demonstrated that DiSP reduced phase cancellation, significantly decreased comb filtering artifacts, and improved spectral coherence in mono playback while preserving stereo width within the original mix. Applying this process to the final left and right channels allows an engineer to mix freely without the concern of the mono mix’s compatibility.
DiSP’s convolution-based approach offers a scalable, adaptive solution for modern mixing and mastering workflows, overcoming the limitations of traditional stereo processing. Future research includes machine learning-driven adaptive DiSP, frequency-dependent processing enhancements, and expansion to spatial audio formats (5.1, 7.1, Dolby Atmos) to optimize mono downmixing. The findings confirm DiSP as a robust and perceptually transparent method for improving mono compatibility without compromising stereo imaging.
Standing waves are a phenomenon ever-present in the reproduction of low frequencies and have a direct impact on the auditory perception of this frequency region. This study addresses the challenges posed by standing waves which are difficult to measure accurately using conventional pressure microphones, due to their spatial and temporal characteristics. To combat these issues, a state-of-the-art sound pressure velocity probe specifically designed for measurement of intensity in the low-frequency spectrum is developed. Using this probe, the research includes the development of new energy estimation parameters to better quantify the characteristics of sound fields influenced by standing waves. Additionally, a novel "standing-wave-ness" parameter is proposed, based on two diffuseness quantities dealing with the proportion of locally confined energy and the temporal variation of the intensity vectors. The performance of the new method and probe is evaluated through both simulated and real-world measurement data. Simulations provide a controlled environment to assess the method's accuracy across a variety of scenarios, including both standing wave and non-standing wave conditions. These initial simulations are followed by validation through measurement data obtained from an anechoic chamber, ensuring that the method's capabilities are tested in highly controlled, close-to-real-world settings. Preliminary results from this dual approach show promising potential for the new method to quantify the presence of standing waves, adding a new dimension in the visualisation and understanding of low-frequency phenomena.
Doctoral researcher at the Acoustics Lab of Aalto University passionate about everything audio. My research focuses on the human perception of the very low frequency spectrum, and so does my day to day life. When I am not in the Acoustics lab, I organise electronic music events where... Read More →
Aki Mäkivirta is R&D Director at Genelec, Iisalmi, Finland, and has been with Genelec since 1995. He received his Master of Science, Licentiate of Science, and Doctor of Science in Technology degrees from Tampere University of Technology, in 1985, 1989, and 1992, respectively. Aki... Read More →
Friday May 23, 2025 12:00pm - 1:30pm CEST Hall FATM Studio Warsaw, Poland
The rapid advancement of generative artificial intelligence has created highly realistic DeepFake multimedia content, posing significant challenges for digital security and authenticity verification. This paper presents the development of a comprehensive testbed designed to detect counterfeit audio content generated by DeepFake techniques. The proposed framework integrates forensic spectral analysis, numerical and statistical modeling, and machine learning-based detection to assess the authenticity of multimedia samples. Our study evaluates various detection methodologies, including spectrogram comparison, Euclidean distance-based analysis, pitch modulation assessment, and spectral flatness deviations. The results demonstrate that cloned and synthetic voices exhibit distinctive acoustic anomalies, with forensic markers such as pitch mean absolute error and power spectral density variations serving as effective indicators of manipulation. By systematically analyzing human, cloned, and synthesized voices, this research provides a foundation for advancing DeepFake detection strategies. The proposed testbed offers a scalable and adaptable solution for forensic audio verification, contributing to the broader effort of safeguarding multimedia integrity in digital environments.
Content exchange and collaboration serve as catalysts for repository creation that supports creative industries and fuels model development in machine learning and AI. Despite numerous repositories, challenges persist in discoverability, rights preservation, and efficient reuse of audiovisual assets. To address these issues, the SCENE (Searchable multi-dimensional Data Lakes supporting Cognitive Film Production & Distribution for the Promotion of the European Cultural Heritage) project introduces an automated audio quality assessment toolkit integrated within its Media Assets Management (MAM) platform. This toolkit comprises a suite of advanced metrics, such as artifact detection, bandwidth estimation, compression history analysis, noise profiling, speech intelligibility, environmental sound recognition, and reverberation characterization. The metrics are extracted using dedicated Flask-based web services that interface with a data lake architecture. By streamlining the inspection of large-scale audio repositories, the proposed solution benefits both high-end film productions and smaller-scale collaborations. The pilot phase of the toolkit will involve professional filmmakers who will provide feedback to refine post-production workflows. This paper presents the motivation, design, and implementation details of the toolkit, highlighting its potential to assess content quality management and contribute to more efficient content exchange in the creative industries.
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
When navigating the environment, we primarily rely on sight. However, in its absence, individuals must develop precise spatial awareness using other senses. A blind person can recognize their immediate surroundings through touch, but assessing larger spaces requires auditory perception. This project presents a method for auditory training in children with visual disabilities through structured audio plays designed to teach spatial pronouns and enhance spatial orientation via auditory stimuli. The format and structure of these audio plays allow for both guided learning with a mentor and independent exploration. Binaural recordings serve as the core component of the training exercises. The developed audio plays and their analyses are available on the YouTube platform in the form of videos and interactive exercises. The next step of this project involves developing an application that enables students to create individual accounts and track their progress. Responses collected during exercises will help assess the impact of the audio plays on students, facilitating improvements and modifications to the training materials. Additionally, linking vision-related questions with responses to auditory exercises will, over time, provide insights into the correlation between these senses. The application can serve multiple purposes: collecting research data, offering spatial recognition and auditory perception training, and creating a comprehensive, structured environment for auditory skill development.
This paper investigates the innovative synthesis of procedurally generated visual and auditory content through the use of Artificial Intelligence (AI) Tools, specifically focusing on Generative Pre-Trained Transformer (GPT) networks. This research explores the process of procedurally generating an audiovisual representations of semantic context by generating images, artificially providing motion and generating corresponding multilayered sound. The process enables the generation of stopped-motion audiovisual representations of concepts. This approach not only highlights the capacity for Generative AI to produce cohesive and semantically rich audiovisual media but also delves into the interconnections between visual art, music, sonification, and computational creativity. By examining the synergy between generated imagery and corresponding soundscapes, this research paper aims to uncover new insights into the aesthetic and technical implications of the use of AI in art. This research embodies a direct application of AI technology across multiple disciplines creating intermodal media. Research findings propose a novel framework for understanding and advancing the use of AI in the creative processes, suggesting potential pathways for future interdisciplinary research and artistic expression. Through this work, this study contributes to the broader discourse on the role of AI in enhancing creative practices, offering perspectives on how various modes of semantic representation can be interleaved using state-of-the-art technology.
We present G.A.D.A. (Guitar Audio Dataset for AI), a novel open-source dataset designed for advancing research in guitar audio analysis, signal processing, and machine learning applications. This comprehensive corpus comprises recordings from three main guitar categories: electric, acoustic, and bass guitars, featuring multiple instruments within each category to ensure dataset diversity and robustness.
The recording methodology employs two distinct approaches based on instrument type. Electric and bass guitars were recorded using direct recording techniques via DI boxes, providing clean, unprocessed signals ideal for further digital processing and manipulation. For acoustic guitars, where direct recording was not feasible, we utilized multiple microphone configurations at various positions to capture the complete acoustic properties of the instruments. Both recording approaches prioritize signal quality while maintaining maximum flexibility for subsequent processing and analysis.
The dataset includes standardized recordings of major and minor chords played in multiple positions and voicings across all instruments. Each recording is accompanied by detailed metadata, including instrument specifications, recording equipment details, microphone configurations (for acoustic guitars), and chord information. The clean signals from electric instruments enable various post-processing applications, including virtual amplifier modeling, effects processing, impulse response convolution, and room acoustics simulation.
To evaluate G.A.D.A.'s effectiveness in machine learning applications, we propose a comprehensive testing framework using established algorithms including k-Nearest Neighbors, Support Vector Machines, Convolutional Neural Networks, and Feed-Forward Neural Networks. These experiments will focus on instrument classification tasks using both traditional audio features and deep learning approaches.
G.A.D.A. will be freely available for academic and research purposes, complete with documentation, preprocessing scripts, example code, and usage guidelines. This resource aims to facilitate research in musical instrument classification, audio signal processing, deep learning applications in music technology, computer-aided music education, and automated music transcription systems.
The combination of standardized recording methodologies, comprehensive metadata, and the inclusion of both direct-recorded and multi-microphone captured audio makes G.A.D.A. a valuable resource for comparative studies and reproducible research in music information retrieval and audio processing.
The increasing demand for spatial audio in applications such as virtual reality, immersive media, and spatial audio research necessitates robust solutions for binaural audio dataset generation for testing and validation. Binamix is an open-source Python library designed to facilitate programmatic binaural mixing using the extensive SADIE II Database, which provides HRIR and BRIR data for 20 subjects. The Binamix library provides a flexible and repeatable framework for creating large-scale spatial audio datasets, making it an invaluable resource for codec evaluation, audio quality metric development, and machine learning model training. A range of pre-built example scripts, utility functions, and visualization plots further streamline the process of custom pipeline creation. This paper presents an overview of the library's capabilities, including binaural rendering, impulse response interpolation, and multi-track mixing for various speaker layouts. The tools utilize a modified Delaunay triangulation technique to achieve accurate HRIR/BRIR interpolation where desired angles are not present in the data. By supporting a wide range of parameters such as azimuth, elevation, subject IRs, speaker layouts, mixing controls, and more, the library enables researchers to create large binaural datasets for any downstream purpose. Binamix empowers researchers and developers to advance spatial audio applications with reproducible methodologies by offering an open-source solution for binaural rendering and dataset generation.
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
In this work, we introduce a Neural 3D Audio Renderer (N3DAR) - a conceptual solution for creating acoustic digital twins of arbitrary spaces. We propose a workflow that consists of several stages including: 1. Simulation of high-fidelity Spatial Room Impulse Responses (SRIR) based on the 3D model of a digitalized space, 2. Building an ML-based model of this space for interpolation and reconstruction of SRIRs, 3. Development of a real-time 3D audio renderer that allows the deployment of the digital twin of a space with accurate spatial audio effects consistent with the actual acoustic properties of this space. The first stage consists of preparation of the 3D model and running the SRIR simulations using the state-of-the-art wave-based method for arbitrary pairs of source-receiver positions. This stage provides a set of learning data being used in the second stage - training the SRIR reconstruction model. The training stage aims to learn the model of the acoustic properties of the digitalized space using the Acoustic Volume Rendering approach (AVR). The last stage is the construction of a plugin with a dedicated 3D audio renderer where rendering comprises reconstruction of the early part of the SRIR, estimation of the reverb part, and HOA-based binauralization. N3DAR allows the building of tailored audio rendering plugins that can be deployed along with visual 3D models of digitalized spaces, where users can freely navigate through the space with 6 degrees of freedom and experience high-fidelity binaural playback in real time. We provide a detailed description of the challenges and considerations for each of the stages. We also conduct an extensive evaluation of the audio rendering capabilities with both, objective metrics and subjective methods using a dedicated evaluation platform.
This paper presents an objective method for estimating the performance of 3D microphone arrays, which is also applicable to 2D arrays. The method incorporates the physical characteristics and relative positions of the microphones, merging these elements through a weighted summation to derive the arrays' directional patterns. These patterns are represented as a "Modified Steering Vector." Additionally, leveraging the spatial properties of spherical harmonics, we transform the array's directional pattern into the spherical harmonic domain. This transformation enables a quantitative analysis of the physical properties of each component, providing a comprehensive understanding of the array's performance. Overall, the proposed method offers a deeply insightful and versatile framework for evaluating the performance of both 2D and 3D microphone arrays by fully exploiting their inherent physical characteristics.
The reconstruction of sound fields is a critical component in a range of applications, including spatial audio for augmented, virtual, and mixed reality (AR/VR/XR) environments, as well as for optimizing acoustics in physical spaces. Traditional approaches to sound field reconstruction predominantly rely on interpolation techniques, which estimate sound fields based on a limited number of spatial and temporal measurements. However, these methods often struggle with issues of accuracy and realism, particularly in complex and dynamic environments. Recent advancements in deep learning have provided promising alternatives, particularly with the introduction of Physics-Informed Neural Networks (PINNs), which integrate physical laws directly into the model training process. This study aims to explore the application of PINNs for sound field reconstruction, focusing on the challenge of predicting acoustic fields in unmeasured areas. The experimental setup involved the collection of impulse response data from the Promenadikeskus concert hall in Pori, Finland, using various source and receiver positions. The PINN framework is then utilized to simulate the hall’s acoustic behavior, with parameters incorporated to model sound propagation across different frequencies and source-receiver configurations. Despite challenges arising from computational load, pre-processing strategies were implemented to optimize the model's efficiency. The results demonstrate that PINNs can accurately reconstruct sound fields in complex acoustic environments, offering significant potential for real-time sound field control and immersive audio applications.
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
3D recordings seem to be an attractive solution when trying to achieve the immersion effect. Recently, Dolby Atmos is an increasingly popular format for distributing three-dimensional music recordings. Although currently the main format for producing music recordings is still stereophony.
How to optimally extend traditional microphone techniques when recording classical music to obtain both stereo recordings and three-dimensional formats (e.g. Dolby Atmos) in the post-production process? The author is trying to answer this question using the example of a recording of Dietrich Buxtehude work "Membra Jesu Nostri", BuxWV 75. The cycle of seven cantatas composed in 1680 is one of the most important and most popular compositions of the early Baroque era. The first Polish recording was made by the Arte Dei Suonatori conducted by Bartłomiej Stankowiak, accompanied by soloists and choral parts performed by the choir Cantus Humanus.
The author will present his concept of a set of microphones for 3D recordings. In addition to the detailed setup of microphones, it will cover the method of post-production of the recording, combining stereo with a mix of the recording into the Dolby Atmos system in a 7.2.4 speaker configuration. A workflow will be proposed to facilitate the change between different formats.
This paper investigates the subjective evaluation of two prominent three-dimensional spatialization techniques—Vector Base Amplitude Panning (VBAP) and High-Order Ambisonics (HOA)—using IRCAM’s Spat in an immersive concert setting. The listening test was conducted in the New Hall at the Royal Danish Academy of Music, which features a 44-speaker immersive audio system. The musical stimuli included electronic compositions and modern orchestral recordings, providing a diverse range of temporal and spectral content. The participants comprised experienced Tonmeisters and non-experienced musicians, who were seated in off-center positions to simulate real-world audience conditions. This study provides an ecologically valid subjective evaluation methodology. The results indicated that VBAP excelled in spatial clarity and sound quality, while HOA demonstrated superior envelopment. The perceptual differences between the two techniques were relatively minor, influenced by room acoustics and suboptimal listening positions. Furthermore, music genre had no significant impact on the evaluation outcomes. The study highlights VBAP’s strength in precise localization and HOA's capability for creating immersive soundscapes, aiming to bridge the gap between ideal and real-world applications in immersive sound reproduction and perception. The findings suggest the need to balance trade-offs when selecting spatialization techniques for specific purposes, venues, and audience positions. Future research will focus on evaluating a wider range of spatialization methods in concert environments and optimizing them to improve the auditory experience for distributed audiences.
Head of Tonmeister Programme, Det Kgl Danske Musikkonservatorium
As a Grammy-nominated producer, engineer and pianist Jesper has recorded around 100 CDs and produced music for radio, TV, theatre, installations and performance. Jesper has also worked as a sound engineer/producer at the Danish Broadcasting Corporation.A recent album-production is... Read More →
If loudspeaker measurements are carried out elevated over a flat, very reflective surface with no nearby obstacles, the recovered impulse response will contain the direct response and one clean delayed reflection. Many loudspeakers are omnidirectional at low frequencies, having a clear acoustic centre, and this reflection will have a low-frequency behaviour that is essentially the same as its direct response, except the amplitude will be down by a 1/r factor. We derive a simple algorithm that iteratively allows this reflection to be cancelled, so that the response of the loudspeaker will be valid to lower frequencies than before, complementing the usual high-frequency response obtained from simple time-truncation of the impulse response. The method is explained, discussed, and illustrated with a two-way system measured over a flat, sealed driveway surface.
The study of electroacoustic parameters in relation to loudspeaker temperature has predominantly focused on large-signal conditions (i.e., high-power audio signals), with limited attention to their behavior under small-signal conditions at equivalent thermal states. This research addresses this gap by investigating the influence of voice-coil temperature on electroacoustic parameters during small-signal operation. The frequency response of the electrical input impedance and the radiated acoustic pressure were measured across different voice-coil temperatures. The results revealed temperature-dependent shifts across all parameters, including the natural frequency in free air (fₛ), mechanical quality factor (Qₘₛ), electrical resistance (Rₑ), electrical inductance (Lₑ), and equivalent compliance volume (Vₐₛ), among others. Specifically, Rₑ and Lₑ increased linearly with temperature, while fₛ decreased and Vₐₛ increased following power-law functions. These changes suggest that thermal effects influence both electrical and mechanical subsystems, potentially amplified by the viscoelastic “creep” effect inherent to loudspeaker suspensions. Finally, simulations of sealed and bandpass enclosures demonstrated noticeable shifts in acoustic performance under thermal variations, emphasizing the importance of considering temperature effects in enclosure design.
Finite Element Method (FEM) simulations are vital in the design of loudspeakers, offering a more efficient alternative to traditional trial-and-error approaches. Precise material characterization, however, is essential in ensuring that theoretical models align closely with measurements. Variations in material properties, particularly those of a loudspeaker’s membrane, can significantly influence loudspeaker performance. This work aims to establish a methodology for evaluating the variability of loudspeaker membrane materials, specifically cones and surrounds, to better understand each materials repeatability among samples, and overall improve the precision and reliability of loudspeaker simulations.
The study first conducts an in-depth analysis of membrane materials, focusing on their Young’s modulus and density, by utilizing both empirical and simulated data. Subsequently, complete loudspeakers were built and investigated, utilizing membranes studied. A FEM simulation framework is presented, and observations are made into discrepancies between measured and simulated loudspeaker responses at specific frequencies and their relation to material modeling.
The results demonstrated significant alignment between simulations and real-life performances, showing interesting insights into the impact of small changes in material properties on the acoustic response of a loudspeaker. One significant finding was the frequency dependence of the Young’s modulus of fiberglass used for a cone. Further validation can be achieved by expanding the dataset of the materials measured, exploring more materials, and under varying conditions such as temperature and humidity. Such insights enable more accurate modeling of loudspeakers and lay the groundwork for exploring novel materials with enhanced acoustic properties, guiding the development of high-performance loudspeakers.
Chiara has joined Faital S.p.A. in 2018, working as a FEM analyst in the R&D Department. Her research activities are focused on thermal phenomena associated with loudspeaker functioning, and mechanical behavior of the speaker moving parts. To this goal, she uses FEM and lumped parameter... Read More →
This paper introduces a new algorithm for multiposition mixed-phase equalization of slot-loaded loudspeaker responses obtained in the horizontal and vertical plane, using finite impulse response (FIR) filters. The algorithm selects a {\em prototype response} that yields a filter that best optimizes a time-domain-based objective metric for equalization for a given direction. The objective metric includes a weighted linear combination of pre-ring energy, early and late reflection energy, and decay rate (characterizing impulse response shortening) during filter synthesis. The results show that the presented mixed-phase multiposition filtering algorithm performs a good equalization along all horizontal directions and for most positions in the vertical direction. Beyond the multiposition filtering capabilities, the algorithm and the metric are suitable for designing mixed-phase filters with low delays, an essential constraint for real-time processing.
Measuring a speaker’s ability to respond to an instantaneous pulse of energy will result in distortion at its output. Factors such as speaker geometry, material properties, equipment error, and the conditions of the environment will create artifacts within the captured data. This paper explores the extraction of time-domain features from these responses, and the training of a predictive model to allow for classification and rapid quality assurance.