5

I have 2000 labelled data (7 different labels) and about 100K unlabeled data and I am trying to use sklearn.semi_supervised.LabelPropagation. The data has 1024 dimensions. My problem is that the classifier is labeling everything as 1. My code looks like this:

X_unlabeled = X_unlabeled[:10000, :]
X_both = np.vstack((X_train, X_unlabeled))
y_both = np.append(y_train, -np.ones((X_unlabeled.shape[0],)))
clf = LabelPropagation(max_iter=100).fit(X_both, y_both)
y_pred = clf.predict(X_test)

y_pred is all ones. Also, X_train is 2000x1024 and X_unlabeled is a subset of the unlabeled data which is 10000x1024.

I also get this error upon calling fit on the classifier:

/usr/local/lib/python2.7/site-packages/sklearn/semi_supervised/label_propagation.py:255: RuntimeWarning: invalid value encountered in divide self.label_distributions_ /= normalizer

Andrew Danks
  • 143
  • 2
  • 7
  • I ran into a similar problem. It turned out that my `X` was setup the wrong way with a row repeated for the entire matrix. I suggest checking the elements of `X_unlabeled`, `X_both`, etc. – Finn Årup Nielsen Mar 08 '17 at 18:14
  • @Andrew Danks: This post is already quite old but how did you solve the problem? – Quasar May 20 '18 at 16:43
  • See this question: https://stackoverflow.com/questions/52057836/labelpropagation-how-to-avoid-division-by-zero – politinsa Jan 22 '20 at 18:53

2 Answers2

2

Have you tried different values for the gamma parameter ? As the graph is constructed by computing an rbf kernel, the computation includes an exponential and the python exponential functions return 0 if the value is a too big negative number (see http://computer-programming-forum.com/56-python/ef71e144330ffbc2.htm). And if the graph is filled with 0, the label_distributions_ is filled with "nan" (because of normalization) and a warning appears. (be careful, the gamma value in scikit implementation is multiplied to the euclidean distance, it's not the same thing as in the Zhu paper.)

chloe
  • 21
  • 4
  • 2
    No problem with your English, but it feels more like a comment. However, I overlooked that for now due to the length of your post. Others may disagree with leniency. :) Please strive for answers in answers. You need to gain enough reputation before commenting, and you can do that via several means. – László Papp Feb 24 '14 at 14:46
  • I completely agree. This is discussion of the problem, not an answer. Stack Overflow has a question-and-answer format, not a discussion thread format. "Answer" means you're actually providing an answer to the question. It's not the same as "Reply" in a discussion forum. Once you have at least 50 rep, you can [post comments](http://stackoverflow.com/help/privileges/comment). Please read the [About](http://www.stackoverflow.com/about) page to get a better understanding of the site's format. – Adi Inbar Feb 24 '14 at 16:42
  • I'm sorry, I don't know if these comments are private or if I could ask you information in private. I agree with you, but I worked on this problem for some time and found no answer on the web. I do not really trust me so I didn't just say : "change the gamma value" (or the implementation of exp() in numpy). But I feel like this could help someone. Thank you for your leniency, but maybe I can reduce my "answer" and then you could put it as a comment ? Thank you and please delete this comment. – chloe Feb 24 '14 at 17:47
0

The LabelPropagation will finally be fixed in version 0.19

joeln
  • 3,563
  • 25
  • 31