Label Propagation in sklearn is classifying every vector as 1

Question

I have 2000 labelled data (7 different labels) and about 100K unlabeled data and I am trying to use sklearn.semi_supervised.LabelPropagation. The data has 1024 dimensions. My problem is that the classifier is labeling everything as 1. My code looks like this:

X_unlabeled = X_unlabeled[:10000, :]
X_both = np.vstack((X_train, X_unlabeled))
y_both = np.append(y_train, -np.ones((X_unlabeled.shape[0],)))
clf = LabelPropagation(max_iter=100).fit(X_both, y_both)
y_pred = clf.predict(X_test)

y_pred is all ones. Also, X_train is 2000x1024 and X_unlabeled is a subset of the unlabeled data which is 10000x1024.

I also get this error upon calling fit on the classifier:

/usr/local/lib/python2.7/site-packages/sklearn/semi_supervised/label_propagation.py:255: RuntimeWarning: invalid value encountered in divide self.label_distributions_ /= normalizer

I ran into a similar problem. It turned out that my `X` was setup the wrong way with a row repeated for the entire matrix. I suggest checking the elements of `X_unlabeled`, `X_both`, etc. — Finn Årup Nielsen, Mar 08 '17 at 18:14
@Andrew Danks: This post is already quite old but how did you solve the problem? — Quasar, May 20 '18 at 16:43
See this question: https://stackoverflow.com/questions/52057836/labelpropagation-how-to-avoid-division-by-zero — politinsa, Jan 22 '20 at 18:53

chloe · Answer 1 · 2014-02-25T10:45:53.870

2

Have you tried different values for the gamma parameter ? As the graph is constructed by computing an rbf kernel, the computation includes an exponential and the python exponential functions return 0 if the value is a too big negative number (see http://computer-programming-forum.com/56-python/ef71e144330ffbc2.htm). And if the graph is filled with 0, the label_distributions_ is filled with "nan" (because of normalization) and a warning appears. (be careful, the gamma value in scikit implementation is multiplied to the euclidean distance, it's not the same thing as in the Zhu paper.)

edited Feb 25 '14 at 10:45

answered Feb 24 '14 at 14:28

chloe

21
4

2

No problem with your English, but it feels more like a comment. However, I overlooked that for now due to the length of your post. Others may disagree with leniency. :) Please strive for answers in answers. You need to gain enough reputation before commenting, and you can do that via several means. – László Papp Feb 24 '14 at 14:46
I completely agree. This is discussion of the problem, not an answer. Stack Overflow has a question-and-answer format, not a discussion thread format. "Answer" means you're actually providing an answer to the question. It's not the same as "Reply" in a discussion forum. Once you have at least 50 rep, you can [post comments](http://stackoverflow.com/help/privileges/comment). Please read the [About](http://www.stackoverflow.com/about) page to get a better understanding of the site's format. – Adi Inbar Feb 24 '14 at 16:42
I'm sorry, I don't know if these comments are private or if I could ask you information in private. I agree with you, but I worked on this problem for some time and found no answer on the web. I do not really trust me so I didn't just say : "change the gamma value" (or the implementation of exp() in numpy). But I feel like this could help someone. Thank you for your leniency, but maybe I can reduce my "answer" and then you could put it as a comment ? Thank you and please delete this comment. – chloe Feb 24 '14 at 17:47

score 0 · Answer 2 · answered Jul 05 '17 at 13:54

0

The LabelPropagation will finally be fixed in version 0.19

answered Jul 05 '17 at 13:54

joeln

3,563
25
31

Label Propagation in sklearn is classifying every vector as 1

2 Answers2