Learning from Few Positives: a Provably Accurate Metric Learning Algorithm to deal with Imbalanced Data

Rémi Viola; Rémi Emonet; Amaury Habrard; Guillaume Metzler; Marc Sebban

Communication Dans Un Congrès Année : 2020

Learning from Few Positives: a Provably Accurate Metric Learning Algorithm to deal with Imbalanced Data

(1, 2, 3) , (1) , (1) , (1) , (1)

1
2
3

Rémi Viola

Fonction : Auteur
PersonId : 184811
IdHAL : remiviola

Laboratoire Hubert Curien

Direction Générale des Finances Publiques

Ministère de l'économie et des finances

Rémi Emonet

Fonction : Auteur
PersonId : 3876
IdHAL : remi-emonet
ORCID : 0000-0002-1870-1329
IdRef : 139072578

Laboratoire Hubert Curien

Amaury Habrard

Fonction : Auteur
PersonId : 439
IdHAL : amaury-habrard
ORCID : 0000-0003-3038-9347
IdRef : 084103655

Laboratoire Hubert Curien

Guillaume Metzler

Fonction : Auteur
PersonId : 740506
IdHAL : guillaume-metzler

Laboratoire Hubert Curien

Marc Sebban

Fonction : Auteur
PersonId : 5203
IdHAL : marc-sebban
ORCID : 0000-0001-6851-169X
IdRef : 050802623

Laboratoire Hubert Curien

Résumé

Learning from imbalanced data, where the positive examples are very scarce, remains a challenging task from both a theoretical and algorithmic perspective. In this paper, we address this problem using a metric learning strategy. Unlike the state-of-the-art methods, our algorithm MLFP, for Metric Learning from Few Positives, learns a new representation that is used only when a test query is compared to a minority training example. From a geometric perspective, it artificially brings positive examples closer to the query without changing the distances to the negative (majority class) data. This strategy allows us to expand the decision boundaries around the positives, yielding a better F-Measure, a criterion which is suited to deal with imbalanced scenarios. Beyond the algorithmic contribution provided by MLFP, our paper presents generalization guarantees on the false positive and false negative rates. Extensive experiments conducted on several imbalanced datasets show the effectiveness of our method.

Mots clés

Metric Learning Imbalanced Classification F-measure Generalization Guarantees

Domaines

Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

MLFP.pdf (1.5 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume METZLER : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02611586

Soumis le : lundi 18 mai 2020-14:25:50

Dernière modification le : jeudi 28 septembre 2023-11:01:47

Dates et versions

hal-02611586 , version 1 (18-05-2020)

Identifiants

HAL Id : hal-02611586 , version 1

Citer

Rémi Viola, Rémi Emonet, Amaury Habrard, Guillaume Metzler, Marc Sebban. Learning from Few Positives: a Provably Accurate Metric Learning Algorithm to deal with Imbalanced Data. IJCAI 2020, the 29th International Joint Conference on Artificial Intelligence, Jul 2020, Yokohama, Japan. ⟨hal-02611586⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE IOGS CNRS PARISTECH UDL

153 Consultations

183 Téléchargements

Learning from Few Positives: a Provably Accurate Metric Learning Algorithm to deal with Imbalanced Data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager