“Spatial Audio - Practical Master Guide” is a free online course on spatial audio content creation. The target group are persons who have basic knowledge on audio production but are not necessarily dedicated experts in the underlying technologies and aesthetics. “Spatial Audio - Practical Master Guide” will be released on the Acoucou platform chapter-by-chapter all through Spring 2025. Some course content is already available as a preview.
The course comprises a variety of audio examples and interactive content that allow for the learners to develop their skills in a playful manner. The entire spectrum from psychoacoustics via the underlying technologies to delivery formats is covered. The course’s highlights are the 14 case studies and step-by-step guides that provide behind-the-scenes information. Many of the course components are self-sufficient so that they can be used in isolation or be integrated into other educational contexts.
The workshop on “Spatial Audio - Practical Master Guide” will provide an overview of the course contents, and we will explain the educational concepts that the course is based on. We will demonstrate the look and feel of the course on the Acoucou platform by demonstrating a set of representative examples from the courseware and provide the audience with the opportunity to experience it themselves. The workshop will wrap up with a discussion of the contexts in which the course contents may be useful besides self-study.
Course contents: Chapter 1: Overview (introduction, history of spatial, evolution of aesthetics in spatial audio) Chapter 2: Psychoacoustics (spatial hearing, perception of reverberation) Chapter 3: Reproduction (loudspeaker arrays, headphones) Chapter 4: Capture (microphone arrays) Chapter 5: Ambisonics (capture, reproduction, editing of ambisonic content) Chapter 6: Storing spatial audio content Chapter 7: Delivery formats
Case studies: Dolby Atmos truck streaming, fulldome, ikosahedral loudspeaker, spatial audio sound installation, spatial audio at Friedrichstadt Palast, spatial audio in the health industry, live music performance with spatial audio, spatial audio in automotive
Step-by-step guides: setting up your spatial audio workstation, channel-based production for music, dolby atmos mix for cinema, ambisonics sound production for 360 film, build your own ambisonic microphone array, interactive spatial audio
UWB as a RF protocol is being heavily used by handset manufacturers for device location applications. As a transport option, UWB offers tremendous possibilities for Professional audio use cases which also require low latency for real time requirements. These applications include digital wireless microphones and In Ear Monitors (IEM’s). These UWB enabled devices, when used for live performances, can deliver a total latency which is able to service Mic to Front of House Mixer and back to the performers IEM’s without a noticeable delay.
UWB is progressing as an audio standard within the AES and it's first iteration was in live performance applications. Issues relating to body blocking due to frequencies (6.5 / 8GHz) and also clocking challenges that could result in dropped packets have been addressed to ensure a stable, reliable link. This workshop will outline how UWB is capable of delivering a low latency link and providing up to 10MHz of data throughput for Hi Res (24/96) Linear PCM audio.
The progression of UWB for Audio is seeing the launch of high end devices which are being supported by several RF wireless vendors. This workshop will dive into the options open to device manufacturer who are considering UWB for their next generation product roadmaps.
In auditory spatial perception, horizontal sound image localization and a sense of spaciousness are based on level and time differences between the left and right ears as cues, and the degree of correlation between the left and right signals is thought to contribute to the sense of horizontal spaciousness, in particular [Hidaka1995, Zotter2013]. For the vertical image spread (VIS), spectral cues are necessary. The change in VIS due to the degree of correlation between the vertical and horizontal signals depends on the frequency response [Gribben2018]. This paper investigated the influence of different correlation values between the top and middle layers of loudspeaker signals within a 3D audio reproduction system on listening impressions through two experiments. The results of experiments using pink noise with different correlation values for the top and middle layers show that the lower the vertical correlation values are, the wider the listening range is, where the impression does not change from the central listening position. From the results of experiments using impulse responses obtained by setting up microphones in an actual concert hall, a tendency to perceive a sense of spaciousness at the off-center listening position was found when cardioid microphones were used for the top layer that were spaced apart from the middle layer. The polar pattern and height of the microphones may have resulted in lower correlation values in the vertical direction, thus widening the listening range of consistent spatial impression outside of the central listening position (i.e., “sweet spot”.)
Toru Kamekawa: After graduating from the Kyushu Institute of Design in 1983, he joined the Japan Broadcasting Corporation (NHK) as a sound engineer. During that period, he gained his experience as a recording engineer, mostly in surround sound programs for HDTV.In 2002, he joined... Read More →
Sound synthesis is a key part of modern music and audio production. Whether you are a producer, composer, or just curious about how electronic sounds are made, this workshop will break it down in a simple and practical way.
We will explore essential synthesis techniques like subtractive, additive, FM, wavetable, and granular synthesis. You will learn how different synthesis methods create and shape sound, and see them in action through live demonstrations using both hardware and virtual synthesizers, including emulators of the legendary studio equipment.
This session is designed for everyone — whether you are a total beginner or an experienced audio professional looking for fresh ideas. You will leave with a solid understanding of synthesis fundamentals and the confidence to start creating your own unique sounds. Join us for an interactive, hands-on introduction to the world of sound synthesis!
This paper proposes the method that plane wave field creation with spherical harmonics for a non-spherical array. In sound field control, there are physics-acoustic models and psycho-acoustic models. Some former are allowed in the location of each loudspeaker, but the sound have the differences between the auditory and the reproduction sound because phantom sources are constructed. The latter developed with wave equation under circle or spherical array conditions which are located strictly, and with high order Ambisonics (HOA) based on spherical harmonics which express only a single point. Therefore, we consider requiring the method which physically creates actual waveforms and provides flexibility in the shape of the loudspeaker array. In this paper, we focus on the Lamé function, changing its order as well as the shape of spatial figures, and propose formulating the distance between the center and each loudspeaker using the function in a polar expression. As the simulation experiment, in the inscribed region, the proposed plane wave can create the same waveform as the spherical one under high order Lamé function which is close to rectangular shape.
The evolution of musical instruments has been deeply influenced by advancements in audio equipment, allowing for the creation of musical instruments that bridge the gap between tradition and modern innovations. This paper highlights the integration of modern technologies such as digital signal processing (DSP), artificial intelligence (AI) and advanced materials into musical instruments to enhance functionality, sound quality and musicians experience at all level by examining the historical progress, design principles and modern innovations.
Major areas of focus include the roles of electronic components such as the pickups, sensors and wireless interfaces in improving the functionality of modern musical instruments, as well high-performance materials on durability and sustainability. The case study of digital pianos and the talking drum will provide practical insights into how these innovations are being implemented alongside the contrast. The paper further addresses challenges such as maintaining cultural authenticity of traditional instruments while integrating modern technology, issue of latency, accessibility for diverse users globally and sustainability concerns in manufacturing.
This paper presents a case study on the auralization of the lost wooden synagogue in Wołpa, digitally reconstructed using a Heritage Building Information Modelling (HBIM) framework for virtual reality (VR) presentation. The study explores how acoustic simulation can aid in the preservation of intangible heritage, focusing on the synagogue’s unique acoustics. Using historical documentation, the synagogue was reconstructed with accurate geometric and material properties, and its acoustics were analyzed through high-fidelity ray-tracing simulations. A key objective of this project is to recreate the Shema Israel ritual, incorporating a historical recording of the rabbi’s prayers. To enable interactive exploration, real-time auralization techniques were optimized to balance computational efficiency and perceptual authenticity, aiming to overcome the trade-offs between simplified VR audio models and physically accurate simulations. This research underscores the transformative potential of immersive technologies in reviving lost heritage, offering a scalable, multi-sensory approach to preserving sacred soundscapes and ritual experiences.
The article explores the innovative concept of interactive music, where both creators and listeners can actively shape the structure and sound of a musical piece in real-time. Traditionally, music is passively consumed, but interactivity introduces a new dimension, allowing for creative participation and raising questions about authorship and the listener's role. The project "Sound Permutation: A Real-Time Interactive Musical Experiment" aims to create a unique audio-visual experience by enabling listeners to choose performers for a chamber music piece in semi-real-time. Two well-known compositions, Edward Elgar's "Salut d’Amour" and Camille Saint-Saëns' "Le Cygne," were recorded by three cellists and three pianists in all possible combinations. This setup allows listeners to seamlessly switch between performers' parts, offering a novel musical experience that highlights the impact of individual musicians on the perception of the piece.
The project focuses on chamber music, particularly the piano-cello duet, and utilizes advanced recording technology to ensure high-quality audio and video. The interactive system, developed using JavaScript allows for smooth video streaming and performer switching. The user interface is designed to be intuitive, featuring options for selecting performers and camera views. The system's optimization ensures minimal disruption during transitions, providing a cohesive musical experience. This project represents a significant step towards making interactive music more accessible, showcasing the potential of technology in shaping new forms of artistic engagement and participation.
In the field of digital audio signal processing (DSP) systems, the choice between standard and proprietary digital audio networks (DANs) can significantly impact both functionality and performance. This abstract aims to explore the benefits, tradeoffs, and economic implications of these two approaches, providing a comprehensive comparison to aid in decision-making processes for audio professionals and system designers. The abstract emphasizes key benefits of A2B, AOIP and older proprietary currently adopted.
Conclusion The choice between standard and proprietary digital audio networks in audio DSP systems involves a careful consideration of benefits, tradeoffs, and economic implications. Standards-based systems provide interoperability and cost-effectiveness, while proprietary solutions offer optimized performance and innovative features. Understanding these factors can guide audio professionals and system designers in making informed decisions that align with their specific needs and long-term goals.
Electrical and Mechanical Engineer Bachelor Degree from Universidad Panamericana in Mexico City. Master in Science in Music Engineering from University of Miami.EMBA from Boston UniversityWorked at Analog Devices developing DSP Software and Algorithms ( SigmaStudio ) for 17 years... Read More →
Thursday May 22, 2025 10:00am - 12:00pm CEST Hall FATM Studio Warsaw, Poland
This paper presents an ongoing project that aims to document the urban soundscapes of the Polish city of Białystok. It describes the progress made so far, including the selection of sonic landmarks, the process of acquiring the audio recordings, and the design of the unique graphic user interface featuring original drawings. Furthermore, it elaborates on the ongoing efforts to extend the project beyond the scope of a typical urban soundscape repository. In the present phase of the project, in addition to monophonic recordings, audio excerpts are acquired in binaural and Ambisonic sound formats, providing listeners with an immersive experience. Moreover, state-of-the-art machine-learning algorithms are applied to analyze gathered audio recordings in terms of their content and spatial characteristics, ultimately providing prospective users of the sound map with some form of automatic audio tagging functionality.
This paper presents a recursive solution to the Broadband Acoustic Contrast Control with Pressure Matching (BACC-PM) algorithm, designed to optimize sound zones systems efficiently in the time domain. Traditional frequency-domain algorithms, while computationally less demanding, often result in non-causal filters with increased pre-ringing, making time-domain approaches preferable for certain applications. However, time-domain solutions typically suffer from high computational costs as a result of the inversion of large convolution matrices. To address these challenges, this study introduces a method based on gradient descent and conjugate gradient descent techniques. By exploiting recursive calculations, the proposed approach significantly reduces computational time compared to direct inversion. Theoretical foundations, simulation setups, and performance metrics are detailed, showcasing the efficiency of the algorithm in achieving high acoustic contrast and low reproduction errors with reduced computational effort. Simulations in a controlled environment demonstrate the advantages of the method.
Digital filters are often used to model or equalize acoustic or electroacoustic transfer functions. Applications include headphone, loudspeaker, and room equalization, or modeling the radiation of musical instruments for sound synthesis. As the final judge of quality is the human ear, filter design should take into account the quasi-logarithmic frequency resolution of the auditory system. This tutorial presents various approaches for achieving this goal, including warped FIR and IIR, Kautz, and fixed-pole parallel filters, and discusses their differences and similarities. Examples will include loudspeaker and room equalization applications, and the equalization of a spherical loudspeaker array. The effect of quantization noise arising in real-world applications will also be considered.
Accurate and efficient simulation of room impulse responses is crucial for spatial audio applications. However, existing acoustic ray-tracing tools often operate as black boxes and only output impulse responses (IRs), providing limited access to intermediate data or spatial fidelity. To address those problems, this paper presents GSound-SIR, a novel Python-based toolkit for room acoustics simulation that addresses these limitations. The contribution of this paper includes the follows. First, GSound-SIR provides direct access to up to millions of raw ray data points from simulations, enabling in-depth analysis of sound propagation paths that was not possible with previous solutions. Second, we introduce a tool to convert acoustic rays into high-order Ambisonic impulse response synthesis, capturing spatial audio cues with greater fidelity than standard techniques. Third, to enhance efficiency, the toolkit implements an energy-based filtering algorithm and can export only the top-X or top-X-% rays. Fourth, we propose to store the simulation results into Parquet formats, facilitating fast data I/O and seamless integration with data analysis workflows. Together, these features make GSound-SIR an advanced, efficient, and modern foundation for room acoustics research, providing researchers and developers with a powerful new tool for spatial audio exploration.
In today's era, 3D audio enables us to craft sounds akin to how composers have created sonic landscapes with orchestras for centuries. We achieve significantly higher spatial precision than conventional stereo thanks to advanced loudspeaker setups like 7.1.4 and 9.1.6. This means that sounds become sharper, more plastic, and thus plausible – like the transition from HD to 8K in the visual realm, yielding an image virtually indistinguishable from looking out of a window.
In the first part of his contribution, Lasse Nipkow introduces a specialized microphone technique that captures instruments in space as if the musicians were right in front of us. This forms the basis for capturing the unique timbres of the instruments while ensuring that the sounds remain as pure as possible for the mix.
In the second part of his contribution, Nipkow elucidates the parallels between classical orchestras and modern pop or singer-songwriter productions. He demonstrates how composers of yesteryear shaped their sounds for concert performances – like our studio practices today with double tracking. Using sound examples, he illustrates how sounds can establish an auditory connection between loudspeakers, thus creating a sound body distinct from individual instruments that stand out solitarily.
Since 2010, Lasse Nipkow has been a renowned keynote speaker in the field of 3D audio music production. His expertise spans from seminars to conferences, both online and offline, and has gained significant popularity. As one of the leading experts in Europe, he provides comprehensive... Read More →
Thursday May 22, 2025 10:45am - 11:45am CEST C4ATM Studio Warsaw, Poland
This paper proposes a new algorithm for enhancing the spatial resolution of measured first-order Ambisonics room impulse responses (FOA RIRs). It applies a separation of the RIR into a salient stream (direct sound and reflections) and a diffuse stream to treat them differently: The salient stream is enhanced using the Ambisonic Spatial Decomposition Method (ASDM) with a single direction of arrival (DOA) per sample of the RIR, while the diffuse stream is enhanced by 4-directional (4D-)ASDM with 4 DOAs at the same time. Listening experiments comparing the new Salient/Diffuse S/D-ASDM to ASDM, 4D-ASDM, and the original FOA RIR reveal the best results for the new algorithm in both spatial clarity and absence of artifacts, especially for its variant, which keeps the DOA constant within each salient event block.
Everybody knows the existence of music with electronic elements. Most of us are aware of the synthesis standing behind it. But the moment I start asking about what's under the hood, the majority of the audience start to run for their lifes. Which is rather sad for me, because learning synthesis could be among the greatest journeys you could take in your life. And I want to back those words up on my workshop.
Let's talk and see what exactly is synthesis, and what it is not. Let's talk about building blocks of basic substractive setup. We will track all the knobs, buttons and sliders, down to every single cable under the front panel. Simply to see which "valve" and "motor" is controlled by which knob. And how does it sounds.
I also want to make you feel safe about modular setups, because when you understand the basic blocks - you understand the modular synthesis. Just like building from bricks!
Head-related transfer functions (HRTFs) are used in auditory applications for spatializing virtual sound sources. Listener-specific HRTFs, which aim at mimicking the filtering of the head, torso and pinnae of a specific listener, improve the perceived quality of virtual sound compared to using non-individualized HRTFs. However, using listener-specific HRTFs may not be accessible for everyone. Here, we propose as an alternative to take advantage of the adaptation abilities of human listeners to a new set of HRTFs. We claim that agreeing upon a single listener-independent set of HRTFs has beneficial effects for long-term adaptation compared to using several, potentially severely different HRTFs. Thus, the Non-individual Ear MOdel (NEMO) initiative is a first step towards a standardized listener-independent set of HRTFs to be used across applications as an alternative to individualization. A prototype, NEMObeta, is presented to explicitly encourage external feedback from the spatial audio community, and to agree on a complete list of requirements for the future HRTF selection.
PhD student in spatial audio, Acoustics Research Institute Vienna & Imperial College London
Katharina Pollack studied electrical engineering audio engineering in Graz, both at the Technical University and the University of Music and Performing Arts in Graz and is doing her PhD at the Acoustics Research Institute in Vienna in the field of spatial hearing. Her main research... Read More →
Multimodal research and applications are becoming more commonplace as Virtual Reality (VR) technology integrates different sensory feedback, enabling the recreation of real spaces in an audio-visual context. Within VR experiences, numerous applications rely on the user’s voice as a key element of interaction, including music performances and public speaking applications. Self-perception of our voice plays a crucial role in vocal production. When singing or speaking, our voice interacts with the acoustic properties of the environment, shaping the adjustment of vocal parameters in response to the perceived characteristics of the space.
This technical report presents a real-time auralization pipeline that leverages three-dimensional Spatial Impulse Responses (SIRs) for multimodal research applications in VR requiring first-person vocal interaction. It describes the impulse response creation and rendering workflow, the audio-visual integration, and addresses latency and computational considerations. The system enables users to explore acoustic spaces from various positions and orientations within a predefined area, supporting three and five Degrees of Freedom (3Dof and 5DoF) in audio-visual multimodal perception for both research and creative applications in VR.
The design of this pipeline arises from the limitations of existing audio tools and spatializers, particularly regarding signal latency, and the lack of SIRs captured from a first-person perspective and in multiple adjacent distributions to enable translational rendering. By addressing these gaps, the system enables real-time auralization of self-generated vocal feedback.
I'm interested in spatial audio, spatial music, and psychoacoustics. I'm the deputy director of the Music & Media Technologies M.Phil. programme in Trinity College Dublin, and a researcher with the ADAPT centre. At this convention I'm presenting a paper on a Ambisonic Decoder Test... Read More →
One day Chet Atkins was playing guitar when a woman approached him. She said, "That guitar sounds beautiful". Chet immediately quit playing. Staring her in the eyes he asked, "How does it sound now?" The quality of the sound in Chet’s case clearly rested with the player, not the instrument, and the quality of our product ultimately lies with us as engineers and producers, not with the gear we use. The dual significance of this question, “How does it sound now”, informs our discussion, since it addresses both the engineer as the driver and the changes we have seen and heard as our business and methodology have evolved through the decades. Let’s start by exploring the methodology employed by the most successful among us when confronted with new and evolving technology. How do we retain quality and continue to create a product that conforms to our own high standards? This may lead to other conversations about the musicians we work with, the consumers we serve, and the differences and similarities between their standards and our own. How high should your standards be? How should it sound now? How should it sound tomorrow?
Wireless audio, both mics and in-ear-monitors, has become essential in many live productions of music and theatre, but it is often fraught with uneasiness and uncertainty. The panel of presenters will draw on their varied experience and knowledge to show how practitioners can use best engineering practices to ensure reliability and performance of their wireless mic and in-ear-monitor systems.
I'm a fellow of the AES, an RF and electronics geek, and live audio specialist, especially in both amateur and professional theater. My résumé includes Senhheiser, ARRL, and a 27-year-long tenure at QSC. Now I help live audio practitioners up their wireless mic and IEM game.I play... Read More →
Thursday May 22, 2025 11:45am - 12:45pm CEST Hall FATM Studio Warsaw, Poland
Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications.
Professor of Audio Engineering, University of York
Gavin Kearney graduated from Dublin Institute of Technology in 2002 with an Honors degree in Electronic Engineering and has since obtained MSc and PhD degrees in Audio Signal Processing from Trinity College Dublin. He joined the University of York as Lecturer in Sound Design in January... Read More →
Jan Skoglund leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such... Read More →
Thursday May 22, 2025 12:00pm - 12:20pm CEST C2ATM Studio Warsaw, Poland
This tutorial proposal presents a comprehensive exploration of spatial audio recording methodologies applied to the unique challenges of documenting Eastern Orthodox liturgical music in monumental acoustic environments. Centered on a recent project at the Church of the Assumption of the Blessed Virgin Mary and St. Joseph in Warsaw, Poland, the session dissects the technical and artistic decisions behind capturing the Męski Zespół Muzyki Cerkiewnej (Male Ensemble of Orthodox Music) “Katapetasma.” The repertoire—spanning 16th-century monodic irmologions, Baroque-era folk chant collections, and contemporary compositions—demanded innovative approaches to balance clarity, spatial immersion, and the venue’s 5-second reverberation time. Attendees will gain insight into hybrid microphone techniques tailored for immersive formats (Dolby Atmos, Ambisonics) and stereo reproduction. The discussion focuses on the strategic deployment of a Decca Tree core augmented by an AMBEO array, height channels, a Faulkner Pair for mid-depth detail, ambient side arrays, and spaced AB ambient pairs to capture the room’s decay. Particular emphasis is placed on reconciling close-miking strategies (essential for textual clarity in melismatic chants) with distant arrays that preserve the sacred space’s acoustic identity. The tutorial demonstrates how microphone placement—addressing both the choir’s position and the building’s 19th-century vaulted architecture—became critical in managing comb filtering and low-frequency buildup. Practical workflow considerations include: Real-time monitoring of spatial imaging through multiple microphone and loudspeaker configurations Phase coherence management between spot microphones and room arrays Post-production techniques for maintaining vocal intimacy within vast reverberant fields Case studies compare results from the Decca/AMBEO hybrid approach against traditional spaced omni configurations, highlighting tradeoffs between localization precision and spatial envelopment. The session also addresses the psychoacoustic challenges of recording small choral ensembles in reverberant spaces, where transient articulation must coexist with diffuse sustain.
This presentation focusses on side and rear channels in recordings in Dolby Atmos. At present, there is no standardised placement for side or rear speakers. This can result in poor localisation in a major portion of the listening area. Sometimes, side speakers are at 90° off the centre axis, sometimes up to 110° off axis. Similarly, rear speakers can be anywhere 120°-135° degrees off axis; in cinemas those can be located directly behind the listener(s). However, an Atmos speaker bed assumes a fixed placement of these side and rear speakers, resulting in inconsistent imaging. Additionally, placing side and rear speakers further off-axis results in a larger gap between them and the front speakers.
These inconsistencies can be minimised by placing these objects at specific virtual locations, whilst avoiding the fixed speaker bed. This ensures a listening experience which represents better what the mix engineer intended. Additionally, reverb feeds can also be sent as objects, to create an illusion further depth. Finally, these additional objects can be fine-tuned for binaural rendering by use of Near/Mid/Far controls.
Mr. Bowles will demonstrate these techniques in an immersive playback session.
David v.R Bowles formed Swineshead Productions, LLC as a classical recording production company in 1995. His recordings have been GRAMMY- and JUNO-nominated and critically acclaimed worldwide. His releases in 3D Dolby Atmos can be found on Avie, OutHere Music (Delos) and Navona labels.Mr... Read More →
Thursday May 22, 2025 2:45pm - 3:30pm CEST C4ATM Studio Warsaw, Poland
Microphones are the very first link in the recording chain, so it’s important to understand them to use them effectively. This presentation will explain the differences between different types of microphones; explain polar-patterns and directivity, proximity effect relative recording distances and a little about room acoustics. Many of these “golden nuggets” helped me greatly when I first understood them and I hope they will help you too.
We will look at the different microphone types – dynamic, moving-coil, ribbon and capacitor microphones, as well as boundary and line-array microphones. We will look at polar patterns and how they are derived. We will look at relative recording distances and a little about understanding room acoustics. All to help you to choose the best microphone for what you want to do and how best to use it.
John Willett is the Managing Director of Sound-Link ProAudio Ltd. who are the official UK distributors for Microtech Gefell microphones, ME-Geithain studio monitors, HUM Audio Devices ribbon microphones (as well as the LAAL – Look Ahead Analogue Limiter, the N-Trophy mixing console... Read More →
Thursday May 22, 2025 2:45pm - 3:45pm CEST C3ATM Studio Warsaw, Poland
Tutorial: Capturing Your Prosumers This session breaks down how top brands like Samsung, Apple, and Slack engage professional and semi-professional buyers. Attendees will gain concrete strategies and psychological insights they can use to boost customer retention and revenue.
Format: 1-Hour Session Key Takeaways: - Understand the psychology behind purchasing decisions of prosumers, drawing on our access to insights from over 300 million global buyers - Explore proven strategies to increase engagement and revenue - Gain actionable frameworks for immediate implementation
A computational framework is proposed for analyzing the temporal evolution of perceptual attributes of sound stimuli. As a paradigm, the perceptual attribute of envelopment, which is manifested in different audio sound reproduction formats, is employed. For this, listener temporal ratings of the envelopment for mono, stereo, and 5.0-channel surround music samples, serve as the ground truth for establishing a computational model that can accurately trace temporal changes from such recordings. Combining established and heuristic methodologies, different features of the audio signals were extracted at each segment that envelopment ratings were registered, named long-term (LT) features. A memory LT computational stage is proposed to account for the temporal variations of the features through the duration of the signal, based on the exponentially weighted moving average of the respective LT features. These are utilized in a gradient tree boosting, machine learning algorithm, leading to a Dynamic Model that accurately predicts the listener’s temporal envelopment ratings. Without the proposed memory LT feature function, a Static Model is also derived, which is shown to have lower performance for predicting such temporal envelopment variations.
Department of Electrical and Computer Engineering, University of Patras
I am a graduate of the Electrical and Computer Engineering Department of the University of Patras. Since 2020, I am a PhD candidate in the same department under the supervision of Professor John Mourjopoulos. My research interests include analysis and modeling of perceptual and affective... Read More →
John Mourjopoulos is Professor Emeritus at the Department of Electrical and Computer Engineering, University of Patras and a Fellow of the AES. As the head of the Audiogroup for nearly 30 years, he has authored and presented more than 200 journal and conference papers. His research... Read More →
Thursday May 22, 2025 3:00pm - 3:20pm CEST C1ATM Studio Warsaw, Poland
This study evaluates the effectiveness of artificial reverberation algorithms that are used to create simulated acoustic environments by comparing them to the acoustic response of the real spaces. A mixed-methods approach, integrating objective and subjective measures, was employed to assess both the accuracy and perceptual quality of simulated acoustics. Real-world spaces, within a research project…, were selected for their varying sizes, functions, and acoustical properties. Objective acoustic measurements—such as Room Impulse Response (RIR), and extracted features i.e. Reverberation Time (RT60), Early Decay Time (EDT), Clarity index (C50, C80), and Definition (D50)—were conducted to establish baseline profiles. Simulated environments were created to replicate real-world conditions, incorporating source-receiver configurations, room geometries, and/or material properties. Objective metrics were extracted from these simulations for comparison with real-world data. After applying the artificial reverberation algorithm, the same objective measurements were re-recorded to assess its impact. Subjective listening tests were also conducted, with a diverse panel of listeners rating the perceived clarity, intelligibility, comfort, and overall sound quality of both real and simulated spaces, using a double-blind procedure to mitigate bias. Statistical analyses, including paired t-tests and correlation analysis, were performed to assess the relationship between objective and subjective evaluations. This approach provides a comprehensive framework for evaluating the algorithm’s ability to enhance simulated acoustics and align them with real-world environments.
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
Automotive audio systems operate in highly reflective and acoustically challenging environments that differ significantly from optimized listening spaces such as concert halls or home theaters. The compact and enclosed nature of car cabins, combined with the presence of reflective surfaces—including the dashboard, windshield, and window, creates strong early reflections that interfere with the direct sound from loudspeakers. These reflections result in coherent interference, comb filtering, and position-dependent variations in frequency response, leading to inconsistent tonal balance, reduced speech intelligibility, and compromised stereo imaging and spatial localization. Traditional approaches, such as equalization and time alignment, attempt to compensate for these acoustic artifacts but do not effectively address coherence issues arising from coherent early reflections. To mitigate these challenges, this study explores Dynamic Diffuse Signal Processing (DiSP) as an alternative solution for reducing early reflection coherence within automotive environments. DiSP is a convolution based signal processing technique that when implemented effectively, decorrelates coherent signals them while remaining perceptually identical. While this method has been successfully studied in sound reinforcement and multi-speaker environments, its application in automotive audio has not been extensively studied. This research investigates the effectiveness of DiSP by analyzing pre- and post-DiSP impulse responses and frequency response variations at multiple listening positions. We assess its effectiveness in mitigating phase interference, reducing comb filtering. Experimental results indicate that DiSP significantly improves the uniformity of sound distribution, reducing spectral deviations across seating positions and minimizing unwanted artifacts caused by early reflections. These findings suggest that DiSP can serve as a powerful tool for optimizing in-car audio reproduction, offering a scalable and computationally efficient approach to improving listener experience in modern automotive sound systems.
I am now working as an audio engineer with my research into 6 Degrees-of-Freedom (6DoF) audio for Virtual Reality (VR); this includes hybrid acoustic modelling methods for real-time calculation. I am currently looking at perceptual differences in different acoustic rendering methods... Read More →
The Honeybee is an insect known to almost all human beings around the world. The sounds produced by bees is a ubiquitous staple of the soundscape of the countryside and forest meadows, bringing an air of natural beauty to the perceived environment. Honeybee-produced sounds are also an important part of apitherapeutic experiences, where the close-quarters exposure to honeybees proves beneficial to the mental and physical well-being of humans. This research investigates the generation of synthetic honeybee buzzing sounds using Conditional Generative Adversarial Networks (cGANs). Trained on a comprehensive dataset of real recordings collected both inside and outside the beehive during a long-term audio monitoring session. The models produce diverse and realistic audio samples. Two architectures were developed: an unconditional GAN for generating long, high-fidelity audio, and a conditional GAN that incorporates time-of-day information to generate shorter samples reflecting diurnal honeybee activity patterns. The generated audio exhibits both spectral and temporal properties similar to real recordings, as confirmed by statistical analysis performed during the experiment. This research has implications for scientific research in honeybee colony health monitoring as well as apitherapy research. and artistic endeavours, for example in sound design and immersive soundscape creation, the trained generator model is publicly available on the project’s website.
Existing methods for moving sound source localization and tracking face significant challenges when dealing with an unknown number of sound sources, which substantially limits their practical applications. This paper proposes a moving sound source tracking method based on source signal envelopes that does not require prior knowledge of the number of sources. First, an encoder-decoder attractor (EDA) method is used to estimate the number of sources and obtain an attractor for each source, based on which the signal envelope of each source is estimated. This signal envelope is then used as a clue for tracking the target source. The proposed method has been validated through simulation experiments. Experimental results demonstrate that the proposed method can accurately estimate the number of sources and precisely track each source.
Traditional methods for inferring room geometry from sound signals are predominantly based on Room Impulse Response (RIR) or prior knowledge of the sound source location. This significantly restricts the applicability of these approaches. This paper presents a method for estimating room geometry based on the localization of direct sound source and its early reflections from First-Order Ambisonics (FOA) signals without the prior knowledge of the environment. First, this method simultaneously estimates the Direction of Arrival (DOA) of the direct source and the detected first-order reflected sources. Then, a Cross-attention-based network for implicitly extracting the features related to Time Difference of Arrival (TDOA) between the direct source source and the first-order reflected sources is proposed to estimate the distances of the direct and the first-order reflected sources. Finally, the room geometry is inferred from the localization results of the direct and the first-order reflected sources. The effectiveness of the proposed method was validated through simulation experiments. The experimental results demonstrate that the method proposed achieves accurate localization results and performs well in inference of room geometry.
In recent years, there has been an increasing interest in binaural technology due to its ability to create immersive spatial audio experiences, particularly in streaming services and virtual reality applications. While audio localization studies typically focus on individual sound sources, ensemble width (EW) is crucial for scene-based analysis, as wider ensembles enhance immersion. We define intended EW as the angular span between the outermost sound sources in an ensemble, controlled during binaural synthesis. This study presents a comparison between human perception of EW and its automatic estimation under simulated anechoic conditions. Fifty-nine participants, including untrained listeners and experts, took part in listening tests, assessing 20 binaural anechoic excerpts synthesized using 2 publicly available music recordings, 2 different HRTFs, and 5 distinct EWs (0° to 90°). The excerpts were played twice in random order via headphones through a web-based survey. Only a subset of ten listeners, of which nine were experts, passed post-screening tests, with a mean absolute error (MAE) of 74.62° (±38.12°), compared to MAE of 5.92° (±0.14°) achieved a by pre-trained machine learning method using auditory modeling and gradient-boosted decision trees. This shows that while intended EW can be algorithmically extracted from synthesized recordings, it significantly differs from human perception. Participants reported insufficient externalization, front-back confusion (suggesting HRTF mismatch). The untrained listeners demonstrated response inconsistencies and a low degree of discriminability, which led to the rejection of most untrained listeners during post-screening. The findings may contribute to the development of perceptually aligned EW estimation models.
This research aims to provide a systematic approach for the analysis of geometrical and material characteristics of traditional frame drums using deep learning. A data-driven approach is used, integrating supervised and unsupervised feature extraction techniques to associate measurable audio features with perceptual attributes. The methodology involves the training of convolutional neural networks on Mel-Scale spectrograms to estimate wood type (classification), diameter (regression), and depth (regression). A multi-labeled dataset containing recorded samples of frame drums of different specifications is used for model training and evaluation. Hierarchical classification is explored, incorporating playing techniques and environmental factors. Handcrafted features enhance interpretability, helping determine the impact of construction attributes on sound perception, ultimately aiding instrument design. Data augmentation techniques, including pitch alterations, additive noise, etc. are introduced to expand the generalization of the approach and dataset expansion.
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
The field of audio production is always evolving. Now with immersive audio formats becoming more and more prominent, we should have a closer look at what possibilities come with it from a technical but most importantly from an artistic and musical standpoint. In our Workshop, "Unlocking New Dimensions: Producing Music in Immersive Audio," we demonstrate how immersive audio formats can bring an artist's vision to life and how the storytelling in the music benefits from them. In order to truly change the way people listen to music and provide an immersive experience, we must transform how we write and produce music, using immersive formats not just as a technical advancement but as a medium to create new art. In this session, we will explore the entire production process, from recording to the final mix, and master with a focus on how one can create a dynamic and engaging listening experience with immersive formats like Dolby Atmos. We believe that immersive audio is more than just a technical upgrade—it's a new creative canvas. Our goal is to show how, by fully leveraging a format like Dolby Atmos, artists and producers can create soundscapes that envelop the listener and add new dimensions to the storytelling of music.
Philosophy
Artists often feel disconnected from the immersive production process. They rarely can give input on how their music is mixed in this format, leading to results that may not fully align with their artistic vision. At High Tide, we prioritize artist involvement, ensuring they are an integral part of the process. We believe that their input is crucial for creating an immersive experience that truly represents their vision. We will share insights and examples from our collaborations with artists like Amistat, an acoustic folk duo, and Tinush, an electronic music producer known for his attention to detail. These case studies will illustrate how our method fosters creativity and produces superior immersive audio experiences.
New workflows need new tools
A significant pain point in current immersive productions is the tendency to use only a few stems, which often limits the immersive potential. This often happens because the process of exporting individual tracks and preparing a mixing session can be time-consuming and labor-intensive. We will address these challenges in our presentation. We have developed innovative scripts and workflows that streamline this process, allowing us to work with all available tracks without the typical hassle. This approach not only enhances the quality of the final mix but also retains the intricate details and nuances of the original recordings. Our workshop is designed to be interactive, with opportunities for attendees to ask questions throughout. We will provide real-world insights into our ProTools sessions, giving participants a detailed look at our Dolby Atmos mixing process. By walking through the entire workflow, from recording with Dolby Atmos in mind to the final mix, attendees will gain a comprehensive understanding of the steps involved and the benefits of this approach to create an engaging and immersive listening experience.
High pass Filters (HPF) in music production, do's and don'ts This presentation aims to bring a thorough insight on the use of high pass filters in music production. WHich type, slope, and frequency settings could be more desirable for a given source or application? Are HPF in microphones and preamps the same? Do they serve the same purpose? is there any rule on when to use one, the other or both? furthermore, HPF is also used extensively in the mixing and processing of audio signals. HPF is commonly applied in the sidechain signal on dynamic processors (EG: buss compressors) and of course in all multiband processing. what are the benefits of this practice? Live sound reinforcement, different approaches on the use of HPF. Different genres call for different production techniques, understanding the basics of this simple albeit important signal filtering process helps in the conscious implementation.
This paper discusses the process of generating natural language music descriptions, called captioning, using deep learning and large language models. A novel encoder architecture is trained to learn large-scale music representations and generate high-quality embeddings, which a pre-trained decoder then uses to generate captions. The captions used for training are from the state-of-the-art LP-MusicCaps dataset. A qualitative and subjective assessment of the quality of created captions is performed, showing the difference between various decoder models.
Embarking on my professional journey as a young DSP engineer at Fraunhofer IIS in Erlangen, Germany, in 1989, I quickly encountered a profound insight that would shape my entire career in audio: audio is not merely data like any other set of numbers; its significance lies in how it sounds to us as human listeners. The sonic quality of audio signals cannot be captured by simple metrics like ‘signal-to-noise ratio.’ Instead, the true goal of any skilled audio engineer should be to enhance quality in ways that are genuinely perceptible through listening, rather than relying solely on mathematical diagnostics.
This foundational concept has been a catalyst for innovation throughout my career, from pioneering popular perceptual audio codecs like MP3 and AAC to exploring audio for VR/AR and AI-driven audio coding.
Join me in this lecture as I share my personal 36-year research journey, that led me to believe that in the world of media, it’s all about perception!