Skip to Main content Skip to Navigation

Construction de Représentation de Données Adaptées dans le Cadre de Peu d'Exemples Étiquetés

Abstract : Machine learning consists in the study and design of algorithms that build models able to handle non trivial tasks as well as or better than humans and hopefully at a lesser cost.These models are typically trained from a dataset where each example describes an instance of the same task and is represented by a set of characteristics and an expected outcome or label which we usually want to predict.An element required for the success of any machine learning algorithm is related to the quality of the set of characteristics describing the data, also referred as data representation or features.In supervised learning, the more the features describing the examples are correlated with the label, the more effective the model will be.There exist three main families of features: the ``observable'', the ``handcrafted'' and the ``latent'' features that are usually automatically learned from the training data.The contributions of this thesis fall into the scope of this last category. More precisely, we are interested in the specific setting of learning a discriminative representation when the number of data of interest is limited.A lack of data of interest can be found in different scenarios.First, we tackle the problem of imbalanced learning with a class of interest composed of a few examples by learning a metric that induces a new representation space where the learned models do not favor the majority examples.Second, we propose to handle a scenario with few available examples by learning at the same time a relevant data representation and a model that generalizes well through boosting models using kernels as base learners approximated by random Fourier features.Finally, to address the domain adaptation scenario where the target set contains no label while the source examples are acquired in different conditions, we propose to reduce the discrepancy between the two domains by keeping only the most similar features optimizing the solution of an optimal transport problem between the two domains.
Document type :
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Monday, May 10, 2021 - 11:49:20 AM
Last modification on : Tuesday, May 11, 2021 - 3:23:04 AM
Long-term archiving on: : Wednesday, August 11, 2021 - 7:04:01 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03222471, version 1


Léo Gautheron. Construction de Représentation de Données Adaptées dans le Cadre de Peu d'Exemples Étiquetés. Apprentissage [cs.LG]. Université de Lyon, 2020. Français. ⟨NNT : 2020LYSES044⟩. ⟨tel-03222471⟩



Record views


Files downloads