Sound fields in enclosures comprise a combination of directional and diffuse components. The directional components include the direct path from the source and the early specular reflections. The diffuse part starts with the first early reflection and builds up gradually over time. An ideal diffuse field is achieved when incoherent reflections begin to arrive randomly from all directions. More specifically, a diffuse field is characterized by having uniform energy density (i.e., independence from measurement position) and an isotropic distribution (i.e. random directions of incidence), which results in zero net energy flow (i.e. the net time-averaged intensity is zero). Despite this broad definition, real diffuse sound fields typically exhibit directional characteristics owing to the geometry and the non-uniform absorptive properties of rooms.
Several models and data-driven metrics based on the definition of a diffuse field have been proposed to assess diffuseness. A widely used metric is the _mixing time_, which indicates the transition of the sound field from directional to diffuse and is known to depend, among other factors, on the room geometry.
The concept of mixing time is closely linked to normalized echo density (NEDP), a measure first used to estimate the mixing time in actual rooms (Abel and Huang, 2006), and later to assess the quality of artificial reverberators in terms of their capacity to produce a dense reverberant tail (De Sena et al., 2015). NEDP is calculated over room impulse responses measured with a pressure probe, evaluating how much the RIR deviates from a normal distribution. Another similar temporal/statistical measure, kurtosis, has been used to similar effect (Jeong, 2016). However, neither NEDP nor kurtosis provides insights into the directional attributes of diffuse fields. While both approaches rely on statistical reasoning rather than identifying individual reflections, another temporal approach uses matching pursuit to identify individual reflections (Defrance et al., 2009).
Another set of approaches focuses on the net energy flow aspect of the diffuse field, providing an energetic analysis framework either in the time domain (Del Galdo et al., 2012) or in the time-frequency domain (Ahonen and Pulkki, 2009). These approaches rely on calculating the time-averaged active intensity, either using intensity probes or first- and higher-order Ambisonics microphones, where a pseudo-intensity-based diffuseness is computed (Götz et al., 2015). The coherence of spherical harmonic decompositions of the sound field has also been used to estimate diffuseness (Epain and Jin, 2016). Beamforming methods have likewise been applied to assess the directional properties of sound fields and to illustrate how real diffuse fields deviate from the ideal (Gover et al., 2004).
We propose a spatio-spectro-temporal (SST) sound field analysis approach based on a sparse plane-wave decomposition of sound fields captured using a higher-order Ambisonics microphone. The proposed approach has the advantage of analyzing the progression of the sound field’s diffuseness in both temporal and spatial dimensions. Several derivative metrics are introduced to assess temporal, spectro-temporal, and spatio-temporal characteristics of the diffuse field, including sparsity, diversity, and isotropy. We define the room sparsity profile (RSP), room sparsity relief (RSR), and room sparsity profile diversity (RSPD) as temporal, spectro-temporal, and spatio-temporal measures of diffuse fields, respectively. The relationship of this new approach to existing diffuseness measures is discussed and supported by experimental comparisons using 4th- and 6th-order acoustic impulse responses, demonstrating the dependence of the new derivative measures on measurement position. We conclude by considering the limitations and applicability of the proposed approach.