Singer Identity Representation Learning using Self-Supervised Techniques

Bernardo Torres; Stefan Lattner; Gael Richard

Communication Dans Un Congrès Année : 2023

Singer Identity Representation Learning using Self-Supervised Techniques

(1, 2, 3) , (4) , (1, 2, 3)

1
2
3
4

Bernardo Torres

Fonction : Auteur
PersonId : 1278938
IdHAL : bernardo-torres
ORCID : 0009-0005-7051-6736

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Laboratoire Traitement et Communication de l'Information

Stefan Lattner

Fonction : Auteur

Sony Computer Science Laboratories Paris

Gael Richard

Fonction : Auteur
PersonId : 14146
IdHAL : gael-richard
IdRef : 094977208

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Laboratoire Traitement et Communication de l'Information

Résumé

Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer identity encoders to extract representations suitable for various singing-related tasks, such as singing voice similarity and synthesis. We explore different selfsupervised learning techniques on a large collection of isolated vocal tracks and apply data augmentations during training to ensure that the representations are invariant to pitch and content variations. We evaluate the quality of the resulting representations on singer similarity and identification tasks across multiple datasets, with a particular emphasis on out-of-domain generalization. Our proposed framework produces high-quality embeddings that outperform both speaker verification and wav2vec 2.0 pre-trained baselines on singing voice while operating at 44.1 kHz. We release our code and trained models to facilitate further research on singing voice and related areas.

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

ISMIR_singer_id (32).pdf (226.52 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Gaël RICHARD : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-04186048

Soumis le : mercredi 23 août 2023-13:46:56

Dernière modification le : lundi 9 octobre 2023-12:49:43

Dates et versions

hal-04186048 , version 1 (23-08-2023)

Identifiants

HAL Id : hal-04186048 , version 1

Citer

Bernardo Torres, Stefan Lattner, Gael Richard. Singer Identity Representation Learning using Self-Supervised Techniques. International Society for Music Information Retrieval Conference (ISMIR 2023), Nov 2023, Milan, Italy. ⟨hal-04186048⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM LTCI IDS S2A IP_PARIS

750 Consultations

401 Téléchargements

Singer Identity Representation Learning using Self-Supervised Techniques

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager