2

this is my first post here.

I don't exactly know how to formulate this question without an example, so it's hard to search for an answer. Anyway, I have a DataFrame that looks like this (with more columns and thousands of rows):

df = pd.DataFrame({"A": [1, 0, 0], "B": [0, 0, 1], "C": [0, 1, 0]})

I would like to create additional column eg. "Type", where value of each row would be column name of the column that contains 1 in this row. For example:

df = pd.DataFrame({"A": [1, 0, 0], "B": [0, 0, 1], "C": [0, 1, 0], "Type": ["A", "C", "B"]})

I hope this makes sense.

Thanks, Chris

ALollz
  • 57,915
  • 7
  • 66
  • 89
Krzysztof
  • 21
  • 3
  • 1
    Welcome to stackoverflow. The representation you have is called "one-hot encoding" and you need to translate it into categorical values. This has been asked before, e.g., [here](https://stackoverflow.com/questions/38334296/reversing-one-hot-encoding-in-pandas), but I guess it is indeed quite hard to find without the right keywords! – fsimonjetz Aug 02 '21 at 18:45
  • Are these one-hot encoded columns or dummies? That can matter for the answer. – ALollz Aug 02 '21 at 18:52
  • Thanks a lot fsimonjetz, it was the same case! ALollz - these are encoded. I used MultiLabelBinarizer on one of my columns and that's the output – Krzysztof Aug 02 '21 at 18:55

2 Answers2

2

You can check for df equals 1, using .eq, and then use idxmax(axis=1) to get the column index of the entry equals 1 in that row, as follows:

df['Type'] = df.eq(1).idxmax(axis=1)

or simplify it for this case where the values are 0's and 1's (thanks for @wwii):

df['Type'] = df.idxmax(axis=1)

Alternatively, you can also use df.dot, as follows:

df['Type'] = df.dot(df.columns)   

Result:

print(df)

   A  B  C Type
0  1  0  0    A
1  0  0  1    C
2  0  1  0    B
SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • 1
    idxmax should operate on the zeros and ones the same as for Trues and Falses.. – wwii Aug 02 '21 at 18:55
  • 1
    @wwii You are right! I just used to handle the general case for in case the value to match is not 1, but yes, we can simplify it in this case. Thanks! – SeaBean Aug 02 '21 at 18:59
1

Using apply:

import pandas as pd

df = pd.DataFrame({"A": [1, 0, 0], "B": [0, 0, 1], "C": [0, 1, 0]})

def getColName(row):    
    x = (df.iloc[row.name] == 1)
    return x.index[x.argmax()]

df['Type'] = df.apply(getColName, axis=1)

print(df)

   A  B  C  Type
0  1  0  0  A
1  0  0  1  C
2  0  1  0  B
MDR
  • 2,610
  • 1
  • 8
  • 18