Convert one-hot encoded data-frame columns into one column

Question

In the pandas data frame, the one-hot encoded vectors are present as columns, i.e:

Rows   A  B  C  D  E

0      0  0  0  1  0
1      0  0  1  0  0
2      0  1  0  0  0
3      0  0  0  1  0
4      1  0  0  0  0
4      0  0  0  0  1

How to convert these columns into one data frame column by label encoding them in python? i.e:

Also need suggestion on this that some rows have multiple 1s, how to handle those rows because we can have only one category at a time.

score 6 · Answer 1 · answered Jul 31 '20 at 17:52

Try with argmax

#df=df.set_index('Rows')

df['New']=df.values.argmax(1)+1
df
Out[231]: 
      A  B  C  D  E  New
Rows                    
0     0  0  0  1  0    4
1     0  0  1  0  0    3
2     0  1  0  0  0    2
3     0  0  0  1  0    4
4     1  0  0  0  0    1
4     0  0  0  0  1    5

score 6 · Answer 2 · edited Aug 01 '20 at 00:19

6

argmaxis the way to go, adding another way using idxmax and get_indexer:

df['New'] = df.columns.get_indexer(df.idxmax(1))+1
#df.idxmax(1).map(df.columns.get_loc)+1
print(df)

Rows  A  B  C  D  E   New
                    
0     0  0  0  1  0    4
1     0  0  1  0  0    3
2     0  1  0  0  0    2
3     0  0  0  1  0    4
4     1  0  0  0  0    1
5     0  0  0  0  1    5

edited Aug 01 '20 at 00:19

Eisha Tir Raazia

327
3
11

answered Jul 31 '20 at 17:54

anky

74,114
11
41
70

Thanks! Can you also tell how would we write it if I have other float columns i.e: X, A...D, F, G, H too in the data-frame? – Eisha Tir Raazia Jul 31 '20 at 18:47
@EishaMazhar unselect the columns having non dummy values. – anky Aug 01 '20 at 05:08
This solved my case as well, https://stackoverflow.com/questions/67977970/pandas-map-a-column-of-values-one-hot-encoding-into-a-single-columns-with-mult, – Ali H. Kudeir Jun 16 '21 at 01:51

score 3 · Answer 3 · edited Jul 31 '20 at 19:14

3

Also need suggestion on this that some rows have multiple 1s, how to handle those rows because we can have only one category at a time.

In this case you dot your DataFrame of dummies with an array of all the powers of 2 (based on the number of columns). This ensures that the presence of any unique combination of dummies (A, A+B, A+B+C, B+C, ...) will have a unique category label. (Added a few rows at the bottom to illustrate the unique counting)

df['Category'] = df.dot(2**np.arange(df.shape[1]))

      A  B  C  D  E  Category
Rows                         
0     0  0  0  1  0         8
1     0  0  1  0  0         4
2     0  1  0  0  0         2
3     0  0  0  1  0         8
4     1  0  0  0  0         1
5     0  0  0  0  1        16
6     1  0  0  0  1        17
7     0  1  0  0  1        18
8     1  1  0  0  1        19

edited Jul 31 '20 at 19:14

Eisha Tir Raazia

327
3
11

answered Jul 31 '20 at 18:11

ALollz

57,915
7
66
89

Thanks, can you also tell the syntax if I have other float columns i.e: X, A...D, F, G, H too in the data-frame? – Eisha Tir Raazia Jul 31 '20 at 18:40
1

@EishaMazhar You won't need to change anything if those are all dummy columns. Otherwise I'd make a list of all of your dummy columns `dummy_cols = ['X', 'A', 'B', ...]` then you can do `df[dummy_cols].dot(2**np.arange(len(dummy_cols))` – ALollz Jul 31 '20 at 19:15

score 3 · Answer 4 · answered Jul 31 '20 at 20:34

Another readable solution on top of other great solutions provided that works for ANY type of variables in your dataframe:

df['variables'] = np.where(df.values)[1]+1

output:

   A  B  C  D  E  variables
0  0  0  0  1  0          4
1  0  0  1  0  0          3
2  0  1  0  0  0          2
3  0  0  0  1  0          4
4  1  0  0  0  0          1
5  0  0  0  0  1          5

Convert one-hot encoded data-frame columns into one column

4 Answers4

Linked