Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning - Département Image, Données, Signal Access content directly
Conference Papers Year : 2023

Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning

Abstract

This paper revisits single-channel audio source separation based on a probabilistic generative model of a mixture signal defined in the continuous time domain. We assume that each source signal follows a non-stationary Gaussian process (GP), i.e., any finite set of sampled points follows a zero-mean multivariate Gaussian distribution whose covariance matrix is governed by a kernel function over time-varying latent variables. The mixture signal composed of such source signals thus follows a GP whose covariance matrix is given by the sum of the source covariance matrices. To estimate the latent variables from the mixture signal, we use a deep neural network with an encoder-separator-decoder architecture (e.g., Conv-TasNet) that separates the latent variables in a pseudo-time-frequency space. The key feature of our method is to feed the latent variables into the kernel function for estimating the source covariance matrices, instead of using the decoder for directly estimating the time-domain source signals. This enables the decomposition of a mixture signal into the source signals with a classical yet powerful Wiener filter that considers the full covariance structure over all samples. The kernel function and the network are trained jointly in the maximum likelihood framework. Comparative experiments using two-speech mixtures under clean, noisy, and noisy-reverberant conditions from the WSJ0-2mix, WHAM!, and WHAMR! benchmark datasets demonstrated that the proposed method performed well and outperformed the baseline method under noisy and noisy-reverberant conditions.
Fichier principal
Vignette du fichier
_WASPAA_23__Time_Domain_Audio_Source_Separation_Based_on_Gaussian_Processes_with_Deep_Kernel_Learning-1.pdf (922.8 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-04172863 , version 1 (28-07-2023)

Identifiers

  • HAL Id : hal-04172863 , version 1

Cite

Aditya Arie Nugraha, Diego Di Carlo, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii. Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning. WASPAA, Oct 2023, New Paltz, France. ⟨hal-04172863⟩
192 View
196 Download

Share

Gmail Mastodon Facebook X LinkedIn More