2

I have a numpy array that looks like the following:

array([[0],[1],[1]])

And I want it to be represented as the one hot encoded equivalent:

array([[1,0],[0,1],[0,1]])

Any body have any ideas? I tried using sklearn.preprocessing.LabelBinarizer but this just re-produces the input.

Thanks.

EDIT

As requested, here is the code using LabelBinarizer

from sklearn.preprocessing import LabelBinarizer

train_y = np.array([[0],[1],[1]])
lb = LabelBinarizer()
lb.fit(train_y)
label_vecs = lb.transform(train_y)

Output:

array([[0],[1],[1]])

Note that it does state in the documentation 'Binary targets transform to a column vector'

P-Gn
  • 23,115
  • 9
  • 87
  • 104
user1753640
  • 193
  • 1
  • 2
  • 11
  • How are you using the LabelBinarizer. It is supposed to get it right. Post your code and current output (you said it just re-produces the input) – Vivek Kumar Apr 20 '17 at 10:51
  • updated as requested – user1753640 Apr 20 '17 at 11:05
  • Ok. For your specified output (`array([[1,0],[0,1],[0,1]])`), you can use MultiLabelBinarizer. Look its usage in my other answer - http://stackoverflow.com/a/42392689/3374996. But – Vivek Kumar Apr 20 '17 at 11:17
  • But this is very confusing. Why do you want to one-hot encode your target (`train_y`). Is this a multi-label classification problem. If not then you should stick to LabelBinarizer and the output is correct – Vivek Kumar Apr 20 '17 at 11:19
  • I'm trying to build a neural net using tensorflow where I am predicting 1 or 0. As such, the labels used for the network must be representative of this. There is a method in tflearn I have just found which will do what I want, [http://tflearn.org/data_utils/#to_categorical](tflearn.data_utils.to_categorical) – user1753640 Apr 20 '17 at 13:38
  • Is OHE necessary? we can't just leave it 0,1? – haneulkim Sep 09 '21 at 08:33

1 Answers1

4

To use sklearn, it seems we could use OneHotEncoder, like so -

from sklearn.preprocessing import OneHotEncoder

train_y = np.array([[0],[1],[1]]) # Input

enc = OneHotEncoder()
enc.fit(train_y)
out = enc.transform(train_y).toarray()

Sample inputs, outputs -

In [314]: train_y
Out[314]: 
array([[0],
       [1],
       [1]])

In [315]: out
Out[315]: 
array([[ 1.,  0.],
       [ 0.,  1.],
       [ 0.,  1.]])

In [320]: train_y
Out[320]: 
array([[9],
       [4],
       [1],
       [6],
       [2]])

In [321]: out
Out[321]: 
array([[ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  1.,  0.,  0.,  0.]])

Another approach with initialization -

def initialization_based(A): # A is Input array
    a = np.unique(A, return_inverse=1)[1]
    out = np.zeros((a.shape[0],a.max()+1),dtype=int)
    out[np.arange(out.shape[0]), a.ravel()] = 1
    return out

Another with broadcasting -

def broadcasting_based(A):  # A is Input array
    a = np.unique(A, return_inverse=1)[1]
    return (a.ravel()[:,None] == np.arange(a.max()+1)).astype(int)
Divakar
  • 218,885
  • 19
  • 262
  • 358