Map DataFrame column name to appropriate cell

Question

this is my first post here.

I don't exactly know how to formulate this question without an example, so it's hard to search for an answer. Anyway, I have a DataFrame that looks like this (with more columns and thousands of rows):

df = pd.DataFrame({"A": [1, 0, 0], "B": [0, 0, 1], "C": [0, 1, 0]})

I would like to create additional column eg. "Type", where value of each row would be column name of the column that contains 1 in this row. For example:

df = pd.DataFrame({"A": [1, 0, 0], "B": [0, 0, 1], "C": [0, 1, 0], "Type": ["A", "C", "B"]})

I hope this makes sense.

Thanks, Chris

Welcome to stackoverflow. The representation you have is called "one-hot encoding" and you need to translate it into categorical values. This has been asked before, e.g., [here](https://stackoverflow.com/questions/38334296/reversing-one-hot-encoding-in-pandas), but I guess it is indeed quite hard to find without the right keywords! — fsimonjetz, Aug 02 '21 at 18:45
Are these one-hot encoded columns or dummies? That can matter for the answer. — ALollz, Aug 02 '21 at 18:52
Thanks a lot fsimonjetz, it was the same case! ALollz - these are encoded. I used MultiLabelBinarizer on one of my columns and that's the output — Krzysztof, Aug 02 '21 at 18:55

SeaBean · Answer 1 · 2021-08-02T19:00:53.947

2

You can check for df equals 1, using .eq, and then use idxmax(axis=1) to get the column index of the entry equals 1 in that row, as follows:

df['Type'] = df.eq(1).idxmax(axis=1)

or simplify it for this case where the values are 0's and 1's (thanks for @wwii):

df['Type'] = df.idxmax(axis=1)

Alternatively, you can also use df.dot, as follows:

df['Type'] = df.dot(df.columns)

Result:

print(df)

   A  B  C Type
0  1  0  0    A
1  0  0  1    C
2  0  1  0    B

edited Aug 02 '21 at 19:00

answered Aug 02 '21 at 18:45

SeaBean

22,547
3
13
25

1

idxmax should operate on the zeros and ones the same as for Trues and Falses.. – wwii Aug 02 '21 at 18:55
1

@wwii You are right! I just used to handle the general case for in case the value to match is not 1, but yes, we can simplify it in this case. Thanks! – SeaBean Aug 02 '21 at 18:59

score 1 · Answer 2 · answered Aug 02 '21 at 18:50

Using apply:

import pandas as pd

df = pd.DataFrame({"A": [1, 0, 0], "B": [0, 0, 1], "C": [0, 1, 0]})

def getColName(row):    
    x = (df.iloc[row.name] == 1)
    return x.index[x.argmax()]

df['Type'] = df.apply(getColName, axis=1)

print(df)

   A  B  C  Type
0  1  0  0  A
1  0  0  1  C
2  0  1  0  B

Map DataFrame column name to appropriate cell

2 Answers2