Apprentissage actif pour l' approximation de variétés

Benoît Gandar

Résumé

(trad auto)Statistical learning seeks to model a functional relationship between two variables X and Y from a random sample of (X,Y) achievements. When the variable Y takes a binary number of values, learning is called classification (or discrimination in French) and learning the functional link is similar to learning the boundary of a variety in the space of the variable X. In this thesis, we place ourselves in the context of active learning, i.e. we assume that the learning sample is no longer random and that we can, through an oracle, generate the points on which variety learning will take place. In the case where the variable Y is continuous (regression), previous work shows that the criterion of low discrepancy to generate the first learning points is adequate. Surprisingly, we show that these results cannot be transferred to the classification. In this manuscript, we then propose the criterion of dispersion for classification. This criterion being difficult to apply in practice, we propose a new algorithm to generate a design of experiment with low dispersion in the unit square. After a first approximation of the variety, successive approximations can be made in order to refine the knowledge of the variety. Two sampling methods are then possible: "selective sampling" which selects the points to be presented to an oracle from a finite set of candidates and "adaptive sampling" which allows any point in the space of variable X to be selected. The second sampling can be seen as a passage at the boundary of the first. However, in practice, it is not reasonable to use this method. We then propose a new algorithm based on the dispersion criterion, leading both exploitation and exploration, to approximate a variety"

L’apprentissage statistique cherche à modéliser un lien fonctionnel entre deux variables X et Y à partir d’un échantillon aléatoire de réalisations de (X,Y). Lorsque la variable Y prend un nombre binaire de valeurs, l’apprentissage s’appelle la classification (ou discrimination en français) et apprendre le lien fonctionnel s’apparente à apprendre la frontière d’une variété dans l’espace de la variable X. Dans cette thèse, nous nous plaçons dans le contexte de l’apprentissage actif, i.e. nous supposons que l’échantillon d’apprentissage n’est plus aléatoire et que nous pouvons, par l’intermédiaire d’un oracle, générer les points sur lesquels l’apprentissage de la variété va s’effectuer. Dans le cas où la variable Y est continue (régression), des travaux précédents montrent que le critère de la faible discrépance pour générer les premiers points d’apprentissage est adéquat. Nous montrons, de manière surprenante, que ces résultats ne peuvent pas être transférés à la classification. Dans ce manuscrit, nous proposons alors le critère de la dispersion pour la classification. Ce critère étant difficile à mettre en pratique, nous proposons un nouvel algorithme pour générer un plan d’expérience à faible dispersion dans le carré unité. Après une première approximation de la variété, des approximations successives peuvent être réalisées afin d’affiner la connaissance de celle-ci. Deux méthodes d’échantillonnage sont alors envisageables : le « selective sampling » qui choisit les points à présenter à un oracle parmi un ensemble fini de candidats et l’« adaptative sampling » qui permet de choisir n’importe quels points de l’espace de la variable X. Le deuxième échantillonnage peut être vu comme un passage à la limite du premier. Néanmoins, en pratique, il n’est pas raisonnable d’utiliser cette méthode. Nous proposons alors un nouvel algorithme basé sur le critère de dispersion, menant de front exploitation et exploration, pour approximer une variété.

(trad auto)Active learning for variety approximation

Apprentissage actif pour l' approximation de variétés

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager