XML Document Classification using SVM - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

XML Document Classification using SVM

Résumé

This paper describes a representation for XML documents in order to classify them. Document classification is based on document representation techniques. How relevant the representation phase is, the more relevant the classification will be. We propose a representation model that exploits both the structure and the content of document. Our approach is based on vector space model: a document is represented by a vector of weighted features. Each feature is a couple of (tag, term). We have expanded tf*idf to calculate feature's weight according to term's structural level in the document. SVM has been used as learning algorithm. Experimentation on Reuters collection shows that our proposition improves classification performance compared to the standard classification model based on term vector.
Fichier principal
Vignette du fichier
CF2010-PUB00029029.pdf (35.75 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00585914 , version 1 (14-04-2011)

Identifiants

Citer

Samaneh Chagheri, Catherine Roussey, Sylvie Calabretto, Cyril Dumoulin. XML Document Classification using SVM. SFC'2010 (Société Francophone de Classification), Jun 2010, Saint Denis de la Réunion, France. pp.71-74. ⟨hal-00585914⟩
276 Consultations
179 Téléchargements

Partager

Gmail Facebook X LinkedIn More