Positive and unlabeled learning in categorical data

Dino Ienco; Ruggero G. Pensa

doi:10.1016/j.neucom.2016.01.089

Article Dans Une Revue Neurocomputing Année : 2016

Positive and unlabeled learning in categorical data

(1, 2) , (3)

1
2
3

Dino Ienco

Fonction : Auteur
PersonId : 6226
IdHAL : dino-ienco
ORCID : 0000-0002-8736-3132
IdRef : 172688183

Territoires, Environnement, Télédétection et Information Spatiale

ADVanced Analytics for data SciencE

Ruggero G. Pensa

Fonction : Auteur

Università degli studi di Torino = University of Turin

Résumé

In common binary classification scenarios, the presence of both positive and negative examples in training data is needed to build an efficient classifier. Unfortunately, in many domains, this requirement is not satisfied and only one class of examples is available. To cope with this setting, classification algorithms have been introduced that learn from Positive and Unlabeled (PU) data. Originally, these approaches were exploited in the context of document classification. Only few works address the PU problem for categorical datasets. Nevertheless, the available algorithms are mainly based on Naive Bayes classifiers. In this work we present a new distance based PU learning approach for categorical data: Pulce. Our framework takes advantage of the intrinsic relationships between attribute values and exceeds the independence assumption made by Naive Bayes. Pulce, in fact, leverages on the statistical properties of the data to learn a distance metric employed during the classification task. We extensively validate our approach over real world datasets and demonstrate that our strategy obtains statistically significant improvements w.r.t. state-of-the-art competitors.

Mots clés

Categorical data Semi-Supervised Learning Positive Unlabaled Learning

Domaines

Recherche d'information [cs.IR] Apprentissage [cs.LG] Base de données [cs.DB]

Fichier principal

pulceneurocom.pdf (1.41 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Dino Ienco : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01374450

Soumis le : lundi 3 octobre 2016-13:51:29

Dernière modification le : mardi 12 mars 2024-10:44:09

Archivage à long terme le : mercredi 4 janvier 2017-12:42:26

Dates et versions

hal-01374450 , version 1 (03-10-2016)

Identifiants

HAL Id : hal-01374450 , version 1
DOI : 10.1016/j.neucom.2016.01.089
IRSTEA : PUB00047400

Citer

Dino Ienco, Ruggero G. Pensa. Positive and unlabeled learning in categorical data. Neurocomputing, 2016, 196 (july), pp.113 - 124. ⟨10.1016/j.neucom.2016.01.089⟩. ⟨hal-01374450⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CIRAD AGROPARISTECH CNRS IRSTEA ADVANSE LIRMM AGROPOLIS TETIS MIPS UNIV-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

244 Consultations

662 Téléchargements

Positive and unlabeled learning in categorical data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager