A Practical Outlier Detection Approach for Mixed-Attribute Data

Bouguessa, Mohamed (2015). « A Practical Outlier Detection Approach for Mixed-Attribute Data ». Expert Systems with Applications, 42(22), pp. 8637-8649.

Fichier(s) associé(s) à ce document :
[img]
Prévisualisation
PDF
Télécharger (1MB)

Résumé

Outlier detection in mixed-attribute space is a challenging problem for which only few approaches have been proposed. However, such existing methods suffer from the fact that there is a lack of an automatic mechanism to formally discriminate between outliers and inliers. In fact, a common approach to outlier identi�cation is to estimate an outlier score for each object and then provide a ranked list of points, expecting outliers to come �rst. A major problem of such an approach is where to stop reading the ranked list? How many points should be chosen as outliers? Other methods, instead of outlier ranking, implement various strategies that depend on user-speci�ed thresholds to discriminate outliers from inliers. Ad hoc threshold values are often used. With such an unprincipled approach it is impossible to be objective or consistent. To alleviate these problems, we propose a principled approach based on the bivariate beta mixture model to identify outliers in mixed attribute data. The proposed approach is able to automatically discriminate outliers from inliers and it can be applied to both mixed-type attribute and single-type (numerical or categorical) attribute data without any feature transformation. Our experimental study demonstrates the suitability of the proposed approach in comparison to mainstream methods.

Type: Article de revue scientifique
Mots-clés ou Sujets: Data Mining, Outlier detection, Mixed-attribute data, Mixture model, Bivariate beta.
Unité d'appartenance: Faculté des sciences > Département d'informatique
Déposé par: Mohamed Bouguessa
Date de dépôt: 10 févr. 2016 14:53
Dernière modification: 20 avr. 2016 19:24
Adresse URL : http://www.archipel.uqam.ca/id/eprint/7776

Statistiques

Voir les statistiques sur cinq ans...