Ensemble Learning in Unsupervised Learning

Question

I have a question regarding the current literature in ensemble learning (more specifically in unsupervised learning).

For what I read in the literature, Ensemble Learning when applied to Unsupervised Learning resumes basically to Clustering Problems. However, if I have x unsupervised methods that output a score (similar to a regression problem), is there an approach that can combine these results into a single one?

Exactly! That what I was looking for. The thing is that Ensemble Learning methods for Unsupervised Anomaly Detection are not really "Ensemble Algorithms". According to the literature they just normalize the score from several Anomaly Detection methods (into a probability, using statistical methods) and they combine them using simple functions such as taking the average. My question was if it's possible to go beyond this, and if there's examples of Ensemble Learning algorithms that don't require labelled data to be applied (for example, voting does not require labelled data). — Miguel Sandim, Mar 20 '17 at 14:54
There are some that do pruning to select detectors, too. What are you missing for a "real" ensemble? — Has QUIT--Anony-Mousse, Mar 21 '17 at 01:23
@Anony-Mousse can you send me literature regarding those pruning models? — Miguel Sandim, Mar 21 '17 at 14:44
Try 'outlier greedy ensemble pruning' in Google scholar. I don't remember all the details, but Zimek is the name to look for. — Has QUIT--Anony-Mousse, Mar 21 '17 at 18:31
I believe this is the reference I was thinking of: Schubert, E., Wojdanowski, R., Zimek, A., & Kriegel, H. P. (2012, April). On evaluation of outlier rankings and outlier scores. In Proceedings of the 2012 SIAM International Conference on Data Mining (pp. 1047-1058). Society for Industrial and Applied Mathematics. — Has QUIT--Anony-Mousse, Mar 21 '17 at 18:34
Also check the cites of that paper on scholar, e.g. Chiang, Alvin, and Yi-Ren Yeh. "Anomaly detection ensembles: In defense of the average." Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE/WIC/ACM International Conference on. Vol. 3. IEEE, 2015. But I believe the SIAM papers are the central papers. — Has QUIT--Anony-Mousse, Mar 21 '17 at 18:37

score 4 · Answer 1 · answered Apr 03 '17 at 11:57

On evaluation of outlier rankings and outlier scores.
Schubert, E., Wojdanowski, R., Zimek, A., & Kriegel, H. P. (2012, April).
In Proceedings of the 2012 SIAM International Conference on Data Mining (pp. 1047-1058). Society for Industrial and Applied Mathematics.

In this publication, we not "just normalize" outlier scores, but we also suggest a unsupervised ensemble member selection strategy called "greedy ensemble".

However, normalization is crucial, and difficult. We published some of the earlier progress with respect to score normalization as

Interpreting and unifying outlier scores.
Kriegel, H. P., Kroger, P., Schubert, E., & Zimek, A. (2011, April).
In Proceedings of the 2011 SIAM International Conference on Data Mining (pp. 13-24). Society for Industrial and Applied Mathematics.

If you don't normalize your scores (and min-max scaling is not enough), you will usually not be able to combine them in a meaningful way, except with very strong preconditions. Even two different subspaces will usually yield incomparable values because of having a different number of features, and different feature scales.

There is also some work on semi-supervised ensembles, e.g.

Learning Outlier Ensembles: The Best of Both Worlds—Supervised and Unsupervised.
Micenková, B., McWilliams, B., & Assent, I. (2014).
In Proceedings of the ACM SIGKDD 2014 Workshop on Outlier Detection and Description under Data Diversity (ODD2). New York, NY, USA (pp. 51-54).

Also beware of overfitting. It's quite easy to arrive at a single good result by tweaking parameters and repeated evaluation. But this leaks evaluation information into your experiment, i.e. you tend to overfit. Performing well across a large range of parameters and data sets is very hard. One of the key observations of the following study was that for every algorithm, you'll find at least one data set and parameter set, where it 'outperforms' the others; but if you change parameters a little, or use a different data set, the benefits of the "superior" new methods are not reproducible.

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study.
Campos, G. O., Zimek, A., Sander, J., Campello, R. J., Micenková, B., Schubert, E., ... & Houle, M. E. (2016).
Data Mining and Knowledge Discovery, 30(4), 891-927.

So you will have to work really hard to do a reliable evaluation. Be careful how to choose parameters.

Ensemble Learning in Unsupervised Learning

1 Answers1