Loading…
Type: AI clear filter
arrow_back View All Dates
Friday, May 23
 

1:45pm CEST

A Testbed for Detecting DeepFake Audio
Friday May 23, 2025 1:45pm - 3:45pm CEST
The rapid advancement of generative artificial intelligence has created highly realistic DeepFake multimedia content, posing significant challenges for digital security and authenticity verification. This paper presents the development of a comprehensive testbed designed to detect counterfeit audio content generated by DeepFake techniques. The proposed framework integrates forensic spectral analysis, numerical and statistical modeling, and machine learning-based detection to assess the authenticity of multimedia samples. Our study evaluates various detection methodologies, including spectrogram comparison, Euclidean distance-based analysis, pitch modulation assessment, and spectral flatness deviations. The results demonstrate that cloned and synthetic voices exhibit distinctive acoustic anomalies, with forensic markers such as pitch mean absolute error and power spectral density variations serving as effective indicators of manipulation. By systematically analyzing human, cloned, and synthesized voices, this research provides a foundation for advancing DeepFake detection strategies. The proposed testbed offers a scalable and adaptable solution for forensic audio verification, contributing to the broader effort of safeguarding multimedia integrity in digital environments.
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland

1:45pm CEST

An audio quality metrics toolbox for media assets management, content exchange, and dataset alignment
Friday May 23, 2025 1:45pm - 3:45pm CEST
Content exchange and collaboration serve as catalysts for repository creation that supports creative industries and fuels model development in machine learning and AI. Despite numerous repositories, challenges persist in discoverability, rights preservation, and efficient reuse of audiovisual assets. To address these issues, the SCENE (Searchable multi-dimensional Data Lakes supporting Cognitive Film Production & Distribution for the Promotion of the European Cultural Heritage) project introduces an automated audio quality assessment toolkit integrated within its Media Assets Management (MAM) platform. This toolkit comprises a suite of advanced metrics, such as artifact detection, bandwidth estimation, compression history analysis, noise profiling, speech intelligibility, environmental sound recognition, and reverberation characterization. The metrics are extracted using dedicated Flask-based web services that interface with a data lake architecture. By streamlining the inspection of large-scale audio repositories, the proposed solution benefits both high-end film productions and smaller-scale collaborations. The pilot phase of the toolkit will involve professional filmmakers who will provide feedback to refine post-production workflows. This paper presents the motivation, design, and implementation details of the toolkit, highlighting its potential to assess content quality management and contribute to more efficient content exchange in the creative industries.
Speakers
avatar for Nikolaos Vryzas

Nikolaos Vryzas

Aristotle University Thessaloniki
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
IT

Iordanis Thoidis

Aristotle University of Thessaloniki
LV

Lazaros Vrysis

Aristotle University of Thessaloniki
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland

1:45pm CEST

Application for Binaural Audio Plays: Development of Auditory Perception and Spatial Orientation
Friday May 23, 2025 1:45pm - 3:45pm CEST
When navigating the environment, we primarily rely on sight. However, in its absence, individuals must develop precise spatial awareness using other senses. A blind person can recognize their immediate surroundings through touch, but assessing larger spaces requires auditory perception.
This project presents a method for auditory training in children with visual disabilities through structured audio plays designed to teach spatial pronouns and enhance spatial orientation via auditory stimuli. The format and structure of these audio plays allow for both guided learning with a mentor and independent exploration. Binaural recordings serve as the core component of the training exercises. The developed audio plays and their analyses are available on the YouTube platform in the form of videos and interactive exercises.
The next step of this project involves developing an application that enables students to create individual accounts and track their progress. Responses collected during exercises will help assess the impact of the audio plays on students, facilitating improvements and modifications to the training materials.
Additionally, linking vision-related questions with responses to auditory exercises will, over time, provide insights into the correlation between these senses. The application can serve multiple purposes: collecting research data, offering spatial recognition and auditory perception training, and creating a comprehensive, structured environment for auditory skill development.
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland

1:45pm CEST

Exploring the Process of Interconnected Procedurally Generated Visual and Audial Content
Friday May 23, 2025 1:45pm - 3:45pm CEST
This paper investigates the innovative synthesis of procedurally generated visual and auditory content through the use of Artificial Intelligence (AI) Tools, specifically focusing on Generative Pre-Trained Transformer (GPT) networks.
This research explores the process of procedurally generating an audiovisual representations of semantic context by generating images, artificially providing motion and generating corresponding multilayered sound. The process enables the generation of stopped-motion audiovisual representations of concepts.
This approach not only highlights the capacity for Generative AI to produce cohesive and semantically rich audiovisual media but also delves into the interconnections between visual art, music, sonification, and computational creativity. By examining the synergy between generated imagery and corresponding soundscapes, this research paper aims to uncover new insights into the aesthetic and technical implications of the use of AI in art.
This research embodies a direct application of AI technology across multiple disciplines creating intermodal media. Research findings propose a novel framework for understanding and advancing the use of AI in the creative processes, suggesting potential pathways for future interdisciplinary research and artistic expression.
Through this work, this study contributes to the broader discourse on the role of AI in enhancing creative practices, offering perspectives on how various modes of semantic representation can be interleaved using state-of-the-art technology.
Speakers
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland

1:45pm CEST

G.A.D.A.: Guitar Audio Dataset for AI - An Open-Source Multi-Class Guitar Corpus
Friday May 23, 2025 1:45pm - 3:45pm CEST
We present G.A.D.A. (Guitar Audio Dataset for AI), a novel open-source dataset designed for advancing research in guitar audio analysis, signal processing, and machine learning applications. This comprehensive corpus comprises recordings from three main guitar categories: electric, acoustic, and bass guitars, featuring multiple instruments within each category to ensure dataset diversity and robustness.

The recording methodology employs two distinct approaches based on instrument type. Electric and bass guitars were recorded using direct recording techniques via DI boxes, providing clean, unprocessed signals ideal for further digital processing and manipulation. For acoustic guitars, where direct recording was not feasible, we utilized multiple microphone configurations at various positions to capture the complete acoustic properties of the instruments. Both recording approaches prioritize signal quality while maintaining maximum flexibility for subsequent processing and analysis.

The dataset includes standardized recordings of major and minor chords played in multiple positions and voicings across all instruments. Each recording is accompanied by detailed metadata, including instrument specifications, recording equipment details, microphone configurations (for acoustic guitars), and chord information. The clean signals from electric instruments enable various post-processing applications, including virtual amplifier modeling, effects processing, impulse response convolution, and room acoustics simulation.

To evaluate G.A.D.A.'s effectiveness in machine learning applications, we propose a comprehensive testing framework using established algorithms including k-Nearest Neighbors, Support Vector Machines, Convolutional Neural Networks, and Feed-Forward Neural Networks. These experiments will focus on instrument classification tasks using both traditional audio features and deep learning approaches.

G.A.D.A. will be freely available for academic and research purposes, complete with documentation, preprocessing scripts, example code, and usage guidelines. This resource aims to facilitate research in musical instrument classification, audio signal processing, deep learning applications in music technology, computer-aided music education, and automated music transcription systems.

The combination of standardized recording methodologies, comprehensive metadata, and the inclusion of both direct-recorded and multi-microphone captured audio makes G.A.D.A. a valuable resource for comparative studies and reproducible research in music information retrieval and audio processing.
Friday May 23, 2025 1:45pm - 3:45pm CEST
Hall F ATM Studio Warsaw, Poland
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date - 
  • Acoustic Transducers & Measurements
  • Acoustics
  • Acoustics of large performance or rehearsal spaces
  • Acoustics of smaller rooms
  • Acoustics of smaller rooms Room acoustic solutions and materials
  • Acoustics & Sig. Processing
  • AI
  • AI & Machine Audition
  • Analysis and synthesis of sound
  • Archiving and restoration
  • Audio and music information retrieval
  • Audio Applications
  • Audio coding and compression
  • Audio effects
  • Audio Effects & Signal Processing
  • Audio for mobile and handheld devices
  • Audio for virtual/augmented reality environments
  • Audio formats
  • Audio in Education
  • Audio perception
  • Audio quality
  • Auditory display and sonification
  • Automotive Audio
  • Automotive Audio & Perception
  • Digital broadcasting
  • Electronic dance music
  • Electronic instrument design & applications
  • Evaluation of spatial audio
  • Forensic audio
  • Game Audio
  • Generative AI for speech and audio
  • Hearing Loss Protection and Enhancement
  • High resolution audio
  • Hip-Hop/R&B
  • Impact of room acoustics on immersive audio
  • Instrumentation and measurement
  • Interaction of transducers and the room
  • Interactive sound
  • Listening tests and evaluation
  • Live event and stage audio
  • Loudspeakers and headphones
  • Machine Audition
  • Microphones converters and amplifiers
  • Microphones converters and amplifiers Mixing remixing and mastering
  • Mixing remixing and mastering
  • Multichannel and spatial audio
  • Music and speech signal processing
  • Musical instrument design
  • Networked Internet and remote audio
  • New audio interfaces
  • Perception & Listening Tests
  • Protocols and data formats
  • Psychoacoustics
  • Room acoustics and perception
  • Sound design and reinforcement
  • Sound design/acoustic simulation of immersive audio environments
  • Spatial Audio
  • Spatial audio applications
  • Speech intelligibility
  • Studio recording techniques
  • Transducers & Measurements
  • Wireless and wearable audio