Knowledge-Based Representation for Transductive Multilingual Document Classification - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Knowledge-Based Representation for Transductive Multilingual Document Classification

Représentation à base de connaissance pour une méthode de classification transductive de document multilangue

Résumé

Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-the-art transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.
Fichier principal
Vignette du fichier
paper_169.pdf (377.65 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

lirmm-01239095 , version 1 (07-12-2015)

Identifiants

Citer

Salvatore Romeo, Dino Ienco, Andrea Tagarelli. Knowledge-Based Representation for Transductive Multilingual Document Classification. 37th European Conference on Information Retrieval (ECIR), Mar 2015, Vienna, Austria. pp.92-103, ⟨10.1007/978-3-319-16354-3_11⟩. ⟨lirmm-01239095⟩
391 Consultations
471 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More