0

Here's my dataset, call them df

Id  Name   Math    Physics   Biology   Chemistry
1   Andy   A       B         A         B
2   Bert   Other   C         Other     A
3   Candy  Other   Other     A         B
4   Dony   B       A         C         B

The expected value excludes 'Other', first expected value is called as 'Grade':

Id  Name   Math    Physics   Biology   Chemistry  Grade
1   Andy   A       B         A         B          A   
2   Bert   Other   C         Other     A          C
3   Candy  Other   Other     A         B          A
4   Dony   B       A         C         B          B
jpp
  • 159,742
  • 34
  • 281
  • 339
Nabih Bawazir
  • 6,381
  • 7
  • 37
  • 70
  • 2
    What have to try so far which doesn't work? Can you list down at least 1 of your approach? – meW Jan 08 '19 at 11:16
  • if you replace other by np.nan , then `df['Grade']=df.iloc[:,2:].bfill(axis=1).iloc[:,0]` – anky Jan 08 '19 at 11:24

4 Answers4

2

mask + bfill

You can mask by a Boolean dataframe, then backfill and take the first column:

df['Grade'] = df.iloc[:, 2:].mask(df.iloc[:, 2:].eq('Other')).bfill(1).iloc[:, 0]
jpp
  • 159,742
  • 34
  • 281
  • 339
1

Here's a solution using justify:

df['Grade'] = justify(df.iloc[:,2:].values, invalid_val='Other')[:,0]

    Id   Name   Math Physics Biology Chemistry Grade
0   1   Andy      A       B       A         B     A
1   2   Bert  Other       C   Other         A     C
2   3  Candy  Other   Other       A         B     A
3   4   Dony      B       A       C         B     B
yatu
  • 86,083
  • 12
  • 84
  • 139
1

Use idxmax + lookup:

df['Grade'] = df.lookup(df.index, (df.iloc[:, 2:] != 'Other').idxmax(axis=1))
print(df)

Output

   Id   Name   Math Physics Biology Chemistry Grade
0   1   Andy      A       B       A         B     A
1   2   Bert  Other       C   Other         A     C
2   3  Candy  Other   Other       A         B     A
3   4   Dony      B       A       C         B     B

With idxmax you get for each row the first column index that is different than Other. Then use lookup to get the values at each cell.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1

replace 'Other' by np.nan

>>df.replace('Other',np.nan,inplace=True)

Then :

>>df['Grade']=df.iloc[:,2:].bfill(axis=1).iloc[:,0]

Restore the Other in place of np.nan

>>df.replace(np.nan,'Other',inplace=True)
anky
  • 74,114
  • 11
  • 41
  • 70