converting one hot labels to categorical

Question

I have a csv file in which contains 4 classes to its corresponding ids. For example

ids   A  B  C D
1     0  1  0 0
2     0  0  1 0
3     1  0  0 0
.
.
.
10000

I am trying to put another column 'pos' which will store corresponding categories. for above example

ids pos
 1   B
 2   C
 3   A

I am using pandas to do that but here i am getting error as** key error=pos**. I am new to python. can someone help to rectify were am i doing wrong?

import pandas as pd

df=pd.read_csv("ABC.csv")

cols=['A','B','C','D']
df['arr']=df[cols].values.tolist()
print(df.head())

for ind in df.head().index:
    print(df['arr'][ind].index(1)+1)
    df['pos'][ind]=df['arr'][ind].index(1)+1

`df[["A", "B", "C", "D"]].idxmax(axis="columns")` gets the *i*n*d*e*x* of the *max*imum value per row (i.e., column name per row). Since data are 1 and 0s, it will pick the positions of 1. — Mustafa Aydın, Jan 15 '23 at 14:38
`df["pos"] = df.apply(lambda row: row.drop("ids").idxmax(), axis=1)` This will do the trick. — Shaida Muhammad, Jan 15 '23 at 15:04
@ShaidaMuhammad that shouldn't do the trick actually. if you are using idxmax, why not use it on the dataframe entirely instead of row-by-row? that makes it very slow; on a (72000, 5) dataframe, it takes ~30 seconds. Otherwise it would take ~90 milliseconds. Also readability. The same comment goes for the approach in the question; explicit for loop makes it slow unfortunately and not really readable. But that's why the library devised methods. — Mustafa Aydın, Jan 15 '23 at 15:35

converting one hot labels to categorical

0 Answers0