0

I have a csv file in which contains 4 classes to its corresponding ids. For example

ids   A  B  C D
1     0  1  0 0
2     0  0  1 0
3     1  0  0 0
.
.
.
10000

I am trying to put another column 'pos' which will store corresponding categories. for above example

ids pos
 1   B
 2   C
 3   A

I am using pandas to do that but here i am getting error as** key error=pos**. I am new to python. can someone help to rectify were am i doing wrong?

import pandas as pd

df=pd.read_csv("ABC.csv")

cols=['A','B','C','D']
df['arr']=df[cols].values.tolist()
print(df.head())

for ind in df.head().index:
    print(df['arr'][ind].index(1)+1)
    df['pos'][ind]=df['arr'][ind].index(1)+1
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
  • `df[["A", "B", "C", "D"]].idxmax(axis="columns")` gets the *i*n*d*e*x* of the *max*imum value per row (i.e., column name per row). Since data are 1 and 0s, it will pick the positions of 1. – Mustafa Aydın Jan 15 '23 at 14:38
  • `df["pos"] = df.apply(lambda row: row.drop("ids").idxmax(), axis=1)` This will do the trick. – Shaida Muhammad Jan 15 '23 at 15:04
  • @ShaidaMuhammad that shouldn't do the trick actually. if you are using idxmax, why not use it on the dataframe entirely instead of row-by-row? that makes it very slow; on a (72000, 5) dataframe, it takes ~30 seconds. Otherwise it would take ~90 milliseconds. Also readability. The same comment goes for the approach in the question; explicit for loop makes it slow unfortunately and not really readable. But that's why the library devised methods. – Mustafa Aydın Jan 15 '23 at 15:35
  • Thanks everyone. @MustafaAydın your approach worked – sheldon cooper Jan 16 '23 at 05:28

0 Answers0