How to convert one-hot encodings into integers?

Question

I have a numpy array data set with shape (100,10). Each row is a one-hot encoding. I want to transfer it into a nd-array with shape (100,) such that I transferred each vector row into a integer that denote the index of the nonzero index. Is there a quick way of doing this using numpy or tensorflow?

So you're trying to decode each row vector? Are you looking for something like `np.argmax()`? It would be helpful if you described the purpose of the decoding. — rosendin, Feb 27 '17 at 23:03

Franck Dernoncourt · Answer 1 · 2017-02-27T23:46:20.503

49

You can use numpy.argmax or tf.argmax. Example:

import numpy as np  
a  = np.array([[0,1,0,0],[1,0,0,0],[0,0,0,1]])
print('np.argmax(a, axis=1): {0}'.format(np.argmax(a, axis=1)))

output:

np.argmax(a, axis=1): [1 0 3]

You may also want to look at sklearn.preprocessing.LabelBinarizer.inverse_transform.

edited Feb 27 '17 at 23:46

answered Feb 27 '17 at 23:35

Franck Dernoncourt

77,520
72
342
501

4

Argmax works for this example because in a one-hot, there's only a single 1 and the rest are zeros. For the general case of finding a particular value in an np n-darray, OP can use np.where. Cheers! – JawguyChooser Feb 27 '17 at 23:45

score 35 · Accepted Answer · edited Feb 11 '20 at 21:03

As pointed out by Franck Dernoncourt, since a one hot encoding only has a single 1 and the rest are zeros, you can use argmax for this particular example. In general, if you want to find a value in a numpy array, you'll probabaly want to consult numpy.where. Also, this stack exchange question:

Is there a NumPy function to return the first index of something in an array?

Since a one-hot vector is a vector with all 0s and a single 1, you can do something like this:

>>> import numpy as np
>>> a = np.array([[0,1,0,0],[1,0,0,0],[0,0,0,1]])
>>> [np.where(r==1)[0][0] for r in a]
[1, 0, 3]

This just builds a list of the index which is 1 for each row. The [0][0] indexing is just to ditch the structure (a tuple with an array) returned by np.where which is more than you asked for.

For any particular row, you just want to index into a. For example in the zeroth row the 1 is found in index 1.

>>> np.where(a[0]==1)[0][0]
1

score 10 · Answer 3 · answered May 19 '20 at 12:06

10

Simply use np.argmax(x, axis=1)

Example:

import numpy as np
array = np.array([[0, 1, 0, 0], [0, 0, 0, 1]])
print(np.argmax(array, axis=1))
> [1 3]

answered May 19 '20 at 12:06

user9114146

153
1
8

2

This is the same as the highly voted answer of Frank that has been made three years before. – questionto42 Jul 15 '21 at 18:36

Martin Thoma · Answer 4 · 2018-07-23T08:22:40.587

1

While I strongly suggest to use numpy for speed, mpu.ml.one_hot2indices(one_hots) shows how to do it without numpy. Simply pip install mpu --user --upgrade.

Then you can do

>>> one_hot2indices([[1, 0], [1, 0], [0, 1]])
[0, 0, 1]

edited Jul 23 '18 at 08:22

answered Jul 23 '18 at 07:50

Martin Thoma

124,992
159
614
958

score 1 · Answer 5 · answered Aug 25 '19 at 00:48

What I do in these cases is something like this. The idea is to interpret the one-hot vector as an index of a 1,2,3,4,5... array.

# Define stuff
import numpy as np
one_hots = np.zeros([100,10])
for k in range(100):
    one_hots[k,:] = np.random.permutation([1,0,0,0,0,0,0,0,0,0])

# Finally, the trick
ramp = np.tile(np.arange(0,10),[100,1])
integers = ramp[one_hots==1].ravel()

I prefer this trick because I feel np.argmax and other suggested solutions may be slower than indexing (although indexing may consume more memory)

score 0 · Answer 6 · answered Nov 20 '18 at 09:40

0

def int_to_onehot(n, n_classes):
    v = [0] * n_classes
    v[n] = 1
    return v

def onehot_to_int(v):
    return v.index(1)


>>> v = int_to_onehot(2, 5)
>>> v
[0, 0, 1, 0, 0]


>>> i = onehot_to_int(v)
>>> i
2

answered Nov 20 '18 at 09:40

Iván Sánchez

248
1
10

`onehot_to_int` only works for lists, not for multi-dimensional arrays. – Blade Dec 05 '19 at 00:12

score 0 · Answer 7 · answered Jan 05 '19 at 14:18

0

You can use this simple code:

a=[[0,0,0,0,0,1,0,0,0,0]]
j=0
for i in a[0]:
    if i==1:
        print(j)
    else:
        j+=1

5

answered Jan 05 '19 at 14:18

Emre Tatbak

103
3
11

score 0 · Answer 8 · answered Jun 16 '22 at 01:01

0

def one_hot_decode(encoded_seq):
    return [argmax(vector) for vector in encoded_seq]

answered Jun 16 '22 at 01:01

drorhun

500
7
22

1

Please include an explanation with your answer to help readers understand how this works, and solves the problem. You can click the edit button at the bottom of your answer to add an explanation. Additionally, you may find it beneficial reading [how to answer](https://stackoverflow.com/help/how-to-answer) – Freddy Mcloughlan Jun 16 '22 at 04:30

How to convert one-hot encodings into integers?

8 Answers8

Linked