The rapid advancement of generative artificial intelligence has created highly realistic DeepFake multimedia content, posing significant challenges for digital security and authenticity verification. This paper presents the development of a comprehensive testbed designed to detect counterfeit audio content generated by DeepFake techniques. The proposed framework integrates forensic spectral analysis, numerical and statistical modeling, and machine learning-based detection to assess the authenticity of multimedia samples. Our study evaluates various detection methodologies, including spectrogram comparison, Euclidean distance-based analysis, pitch modulation assessment, and spectral flatness deviations. The results demonstrate that cloned and synthetic voices exhibit distinctive acoustic anomalies, with forensic markers such as pitch mean absolute error and power spectral density variations serving as effective indicators of manipulation. By systematically analyzing human, cloned, and synthesized voices, this research provides a foundation for advancing DeepFake detection strategies. The proposed testbed offers a scalable and adaptable solution for forensic audio verification, contributing to the broader effort of safeguarding multimedia integrity in digital environments.
Content exchange and collaboration serve as catalysts for repository creation that supports creative industries and fuels model development in machine learning and AI. Despite numerous repositories, challenges persist in discoverability, rights preservation, and efficient reuse of audiovisual assets. To address these issues, the SCENE (Searchable multi-dimensional Data Lakes supporting Cognitive Film Production & Distribution for the Promotion of the European Cultural Heritage) project introduces an automated audio quality assessment toolkit integrated within its Media Assets Management (MAM) platform. This toolkit comprises a suite of advanced metrics, such as artifact detection, bandwidth estimation, compression history analysis, noise profiling, speech intelligibility, environmental sound recognition, and reverberation characterization. The metrics are extracted using dedicated Flask-based web services that interface with a data lake architecture. By streamlining the inspection of large-scale audio repositories, the proposed solution benefits both high-end film productions and smaller-scale collaborations. The pilot phase of the toolkit will involve professional filmmakers who will provide feedback to refine post-production workflows. This paper presents the motivation, design, and implementation details of the toolkit, highlighting its potential to assess content quality management and contribute to more efficient content exchange in the creative industries.
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production... Read More →
When navigating the environment, we primarily rely on sight. However, in its absence, individuals must develop precise spatial awareness using other senses. A blind person can recognize their immediate surroundings through touch, but assessing larger spaces requires auditory perception. This project presents a method for auditory training in children with visual disabilities through structured audio plays designed to teach spatial pronouns and enhance spatial orientation via auditory stimuli. The format and structure of these audio plays allow for both guided learning with a mentor and independent exploration. Binaural recordings serve as the core component of the training exercises. The developed audio plays and their analyses are available on the YouTube platform in the form of videos and interactive exercises. The next step of this project involves developing an application that enables students to create individual accounts and track their progress. Responses collected during exercises will help assess the impact of the audio plays on students, facilitating improvements and modifications to the training materials. Additionally, linking vision-related questions with responses to auditory exercises will, over time, provide insights into the correlation between these senses. The application can serve multiple purposes: collecting research data, offering spatial recognition and auditory perception training, and creating a comprehensive, structured environment for auditory skill development.
This paper investigates the innovative synthesis of procedurally generated visual and auditory content through the use of Artificial Intelligence (AI) Tools, specifically focusing on Generative Pre-Trained Transformer (GPT) networks. This research explores the process of procedurally generating an audiovisual representations of semantic context by generating images, artificially providing motion and generating corresponding multilayered sound. The process enables the generation of stopped-motion audiovisual representations of concepts. This approach not only highlights the capacity for Generative AI to produce cohesive and semantically rich audiovisual media but also delves into the interconnections between visual art, music, sonification, and computational creativity. By examining the synergy between generated imagery and corresponding soundscapes, this research paper aims to uncover new insights into the aesthetic and technical implications of the use of AI in art. This research embodies a direct application of AI technology across multiple disciplines creating intermodal media. Research findings propose a novel framework for understanding and advancing the use of AI in the creative processes, suggesting potential pathways for future interdisciplinary research and artistic expression. Through this work, this study contributes to the broader discourse on the role of AI in enhancing creative practices, offering perspectives on how various modes of semantic representation can be interleaved using state-of-the-art technology.
We present G.A.D.A. (Guitar Audio Dataset for AI), a novel open-source dataset designed for advancing research in guitar audio analysis, signal processing, and machine learning applications. This comprehensive corpus comprises recordings from three main guitar categories: electric, acoustic, and bass guitars, featuring multiple instruments within each category to ensure dataset diversity and robustness.
The recording methodology employs two distinct approaches based on instrument type. Electric and bass guitars were recorded using direct recording techniques via DI boxes, providing clean, unprocessed signals ideal for further digital processing and manipulation. For acoustic guitars, where direct recording was not feasible, we utilized multiple microphone configurations at various positions to capture the complete acoustic properties of the instruments. Both recording approaches prioritize signal quality while maintaining maximum flexibility for subsequent processing and analysis.
The dataset includes standardized recordings of major and minor chords played in multiple positions and voicings across all instruments. Each recording is accompanied by detailed metadata, including instrument specifications, recording equipment details, microphone configurations (for acoustic guitars), and chord information. The clean signals from electric instruments enable various post-processing applications, including virtual amplifier modeling, effects processing, impulse response convolution, and room acoustics simulation.
To evaluate G.A.D.A.'s effectiveness in machine learning applications, we propose a comprehensive testing framework using established algorithms including k-Nearest Neighbors, Support Vector Machines, Convolutional Neural Networks, and Feed-Forward Neural Networks. These experiments will focus on instrument classification tasks using both traditional audio features and deep learning approaches.
G.A.D.A. will be freely available for academic and research purposes, complete with documentation, preprocessing scripts, example code, and usage guidelines. This resource aims to facilitate research in musical instrument classification, audio signal processing, deep learning applications in music technology, computer-aided music education, and automated music transcription systems.
The combination of standardized recording methodologies, comprehensive metadata, and the inclusion of both direct-recorded and multi-microphone captured audio makes G.A.D.A. a valuable resource for comparative studies and reproducible research in music information retrieval and audio processing.