How to prevent LabelEncoder from sorting label values?

Question

Scikit LabelEncoder is showing some puzzling behavior in my Jupyter Notebook, as in:

from sklearn.preprocessing import LabelEncoder
le2 = LabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))

prints ['one' 'one' 'one' 'zero' 'zero' 'zero']. This is odd, shouldn't it print ['zero' 'zero' 'zero' 'one' 'one' 'one']? Then I tried

le3 = LabelEncoder()
le3.fit(['one', 'zero'])
print (le3.inverse_transform([0, 0, 0, 1, 1, 1]))

which also prints ['one' 'one' 'one' 'zero' 'zero' 'zero']. Perhaps there was an alphabetization thing happening? Next, I tried

le4 = LabelEncoder()
le4.fit(['nil', 'one'])
print (le4.inverse_transform([0, 0, 0, 1, 1, 1]))

which prints ['nil' 'nil' 'nil' 'one' 'one' 'one']!

I've spent several hours on this. FWIW, the example in the documentation works as expected so I suspect there is a flaw in how I expect inverse_transform to work. Part of my research included this and this.

In case it is relevant, I'm using iPython 7.7.0, numpy 1.17.3 and scikit-learn version 0.21.3.

milos.ai · Accepted Answer · 2019-11-17T10:09:15.577

1

Thing is that LabelEncoder.fit() returns sorted data always. That is because it uses np.unique Here's the source code

I guess the only way to do what you want is to create your own fit method and override the original one from LabelEncoder.

You just need to reuse the existing code as given in the link, here's example:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import column_or_1d

class MyLabelEncoder(LabelEncoder):

    def fit(self, y):
        y = column_or_1d(y, warn=True)
        self.classes_ = pd.Series(y).unique()
        return self

le2 = MyLabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))

gives you:

['zero' 'zero' 'zero' 'one' 'one' 'one']

edited Nov 17 '19 at 10:09

answered Nov 16 '19 at 18:39

milos.ai

3,882
7
31
33

2

hah ,look what I have just found on SO. https://stackoverflow.com/questions/51308994/python-sklearn-determine-the-encoding-order-of-labelencoder – milos.ai Nov 16 '19 at 18:46
Great find! If you want to edit your answer based on [this answer](https://stackoverflow.com/a/51310759/653651), I'll accept it. – singhj Nov 16 '19 at 21:51

How to prevent LabelEncoder from sorting label values?

1 Answers1

Linked