Scikit LabelEncoder is showing some puzzling behavior in my Jupyter Notebook, as in:
from sklearn.preprocessing import LabelEncoder
le2 = LabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))
prints ['one' 'one' 'one' 'zero' 'zero' 'zero']
.
This is odd, shouldn't it print ['zero' 'zero' 'zero' 'one' 'one' 'one']
? Then I tried
le3 = LabelEncoder()
le3.fit(['one', 'zero'])
print (le3.inverse_transform([0, 0, 0, 1, 1, 1]))
which also prints ['one' 'one' 'one' 'zero' 'zero' 'zero']
. Perhaps there was an alphabetization thing happening? Next, I tried
le4 = LabelEncoder()
le4.fit(['nil', 'one'])
print (le4.inverse_transform([0, 0, 0, 1, 1, 1]))
which prints ['nil' 'nil' 'nil' 'one' 'one' 'one']
!
I've spent several hours on this. FWIW, the example in the documentation works as expected so I suspect there is a flaw in how I expect inverse_transform
to work. Part of my research included this and this.
In case it is relevant, I'm using iPython 7.7.0, numpy 1.17.3 and scikit-learn version 0.21.3.