A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2 - Equipe Signal, Statistique et Apprentissage Access content directly
Conference Papers Year : 2024

A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2

Abstract

Isolating the desired speaker’s voice amidst multiple speakers in a noisy acoustic context is a challenging task. Per- sonalized speech enhancement (PSE) endeavours to achieve this by leveraging prior knowledge of the speaker’s voice. Recent research efforts have yielded promising PSE mod- els, albeit often accompanied by computationally intensive architectures, unsuitable for resource-constrained embedded devices. In this paper, we introduce a novel method to per- sonalize a lightweight dual-stage Speech Enhancement (SE) model and implement it within DeepFilterNet2, a SE model renowned for its state-of-the-art performance. We seek an optimal integration of speaker information within the model, exploring different positions for the integration of the speaker embeddings within the dual-stage enhancement architec- ture. We also investigate a tailored training strategy when adapting DeepFilterNet2 to a PSE task. We show that our personalization method greatly improves the performances of DeepFilterNet2 while preserving minimal computational overhead.
Fichier principal
Vignette du fichier
main.pdf (263.26 Ko) Télécharger le fichier
Origin Files produced by the author(s)
licence
Copyright

Dates and versions

hal-04541350 , version 1 (10-04-2024)

Licence

Copyright

Identifiers

  • HAL Id : hal-04541350 , version 1

Cite

Thomas Serre, Mathieu Fontaine, Éric Benhaim, Geoffroy Dutour, Slim Essid. A LIGHTWEIGHT DUAL-STAGE FRAMEWORK FOR PERSONALIZED SPEECH ENHANCEMENT BASED ON DEEPFILTERNET2. ICASSP, Apr 2024, Seoul (Korea), South Korea. ⟨hal-04541350⟩
40 View
30 Download

Share

Gmail Mastodon Facebook X LinkedIn More