Adapting Pitch-Based Self Supervised Learning Models for Tempo Estimation - Equipe Signal, Statistique et Apprentissage Access content directly
Conference Papers Year : 2024

Adapting Pitch-Based Self Supervised Learning Models for Tempo Estimation

Abstract

Tempo estimation is the task of estimating the periodicity of the dominant rhythm pulse of a music audio signal. It has therefore a close relationship with dominant pitch estimation. Recently, both tasks have been addressed in a ssl fashion so as to leverage unlabelled data for training. In this work, we study the applicability of two successful pitch-based ssl models, SPICE and PESTO, for the purpose of tempo estimation. Both successfully exploit Siamese networks with a pitch-shifting view generation between the two branches. To apply these models for tempo estimation, we represent the audio signal by the cqt of its onset-strength-function and adapt their view generation using time-stretching (instead of pitch shifting), which is efficiently implemented by shifting the cqt. In a large experiment, we show that simply adapting PESTO in this way yields superior results than the previous ssl approach to tempo estimation for most datasets used in the reference benchmark. Further, since PESTO is light-weight, requiring only a few training data, we study a new learning scheme where the downstream datasets are processed directly in a ssl fashion (without access to labels) showing that this is an interesting alternative further improving the performance for some datasets.
Fichier principal
Vignette du fichier
icassp__USING_PITCH_BASED_SUPERVISED_LEARNING_MODEL_FOR_TEMPO_ESTIMATION.pdf (396.71 Ko) Télécharger le fichier
icassp__USING_PITCH_BASED_SUPERVISED_LEARNING_MODEL_FOR_TEMPO_ESTIMATION (1).pdf (396.71 Ko) Télécharger le fichier
yfhkcyrykftdmzrkxgjfgbsgpvntbzcs.zip (1.78 Mo) Télécharger le fichier
Origin Files produced by the author(s)
Origin Files produced by the author(s)

Dates and versions

hal-04544157 , version 1 (26-04-2024)

Identifiers

Cite

Antonin Gagneré, Slim Essid, Geoffroy Peeters. Adapting Pitch-Based Self Supervised Learning Models for Tempo Estimation. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024, Seoul, South Korea. pp.956-960, ⟨10.1109/ICASSP48485.2024.10447129⟩. ⟨hal-04544157⟩
4 View
1 Download

Altmetric

Share

Gmail Mastodon Facebook X LinkedIn More