1

I have the below dataset:

device_id   A   B   C   Current Class   
1           70  35  40     C                
2           45  90  34     B

Now each device has a score within each class( A,B,C) and it is currently a part of a certain class. Based on the class for which it has the highest score , a class change will either be recommended or not.

For example, device 1 is in class C but it's highest score is in class A and hence it's recommended class will be A.

Expected output:

device_id   A   B   C   Current Class   Class Change    Recommended
1           70  35  40  C                   Yes             A
2           45  90  34  B                   No              B

Can someone please help me with this??

Shuvayan Das
  • 1,198
  • 3
  • 20
  • 40
  • 1
    Possible duplicate of [Find the column name which has the maximum value for each row](https://stackoverflow.com/questions/29919306/find-the-column-name-which-has-the-maximum-value-for-each-row) – Vaishali Jan 02 '18 at 14:49
  • Also, df['Class Change'] = (df['Current Class']==df[['A', 'B', 'C']].idxmax(1)).apply(lambda x: 'No' if x else 'Yes') – DrTRD Jan 02 '18 at 14:52
  • I was this close to posting my answer until I saw that ^ – cs95 Jan 02 '18 at 14:52
  • Also, neither answer is taking into account the fact that you could have any number of classes. I'd recommend smartly `df = df.set_index('device_id')` and then `df.iloc[:, :-1].idxmax(1)`. – cs95 Jan 02 '18 at 14:53

3 Answers3

1

I think you need idxmax with numpy.where:

a = df[['A','B','C']].idxmax(axis=1)
#more general solution is select all columns without first and last
#a = df.iloc[:, 1:-1].idxmax(axis=1)
print (df.iloc[:, 1:-1])
    A   B   C
0  70  35  40
1  45  90  34

df['Class Change'] = np.where(df['Current Class'] == a, 'No', 'Yes')
df['Recommended'] = a
print (df)
   device_id   A   B   C Current Class Class Change Recommended
0          1  70  35  40             C          Yes           A
1          2  45  90  34             B           No           B

Detail:

print (a)
0    A
1    B
dtype: object

If order of new columns is not important and should be swapped:

df['Recommended'] = df[['A','B','C']].idxmax(1)
df['Class Change'] = np.where(df['Current Class'] == df['Recommended'], 'No', 'Yes')
print (df)
   device_id   A   B   C Current Class Recommended Class Change
0          1  70  35  40             C           A          Yes
1          2  45  90  34             B           B           No
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

I would first find the column with the max to get the Recommended row, and then check if that matches the Current Class to get the Class Change row, like this:

devices = pd.DataFrame({'A':[70, 45],
                       'B':[35, 90],
                       'C':[40, 34],
                       'Current Class':['C','B']})

devices['Recommended'] = devices[['A', 'B', 'C']].idxmax(1)

devices['Class Change'] = devices['Current Class'] == devices['Recommended']

print(devices)

output:

    A   B   C Current Class Recommended  Class Change
0  70  35  40             C           A         False
1  45  90  34             B           B          True
Kewl
  • 3,327
  • 5
  • 26
  • 45
1

numpy solution : -)

df['Recommended']=np.array(list('ABC'))[np.argmax(df[list('ABC')].values,1)]
df
Out[172]: 
   device_id   A   B   C CurrentClass Recommended
0          1  70  35  40            C           A
1          2  45  90  34            B           B
(df.CurrentClass==df.Recommended).map({False:'no',True:'yes'})
Out[173]: 
0     no
1    yes
dtype: object
df['Class Change']=(df.CurrentClass==df.Recommended).map({False:'no',True:'yes'})
df
Out[175]: 
   device_id   A   B   C CurrentClass Recommended Class Change
0          1  70  35  40            C           A           no
1          2  45  90  34            B           B          yes
BENY
  • 317,841
  • 20
  • 164
  • 234