4

I have a dataframe: df-

    A   B   C   D   E
0   V   10  5   18  20
1   W   9   18  11  13
2   X   8   7   12  5
3   Y   7   9   7   8
4   Z   6   5   3   90

I want to add a column 'Result' which should return 1 if the value in column 'E' is greater than the values in B, C & D columns else return 0.

Output should be:

    A   B   C   D   E   Result
0   V   10  5   18  20  1
1   W   9   18  11  13  0
2   X   8   7   12  5   0
3   Y   7   9   7   8   0
4   Z   6   5   3   90  1

For few columns, i would use logic like : if(and(E>B,E>C,E>D),1,0), But I have to compare around 20 columns (from B to U) with column name 'V'. Additionally, the dataframe has around 100 thousand rows.

I am using

df['Result']=np.where((df.ix[:,1:20])<df['V']).all(1),1,0)

And it gives a Memory error.

RSNboim
  • 130
  • 1
  • 8
piyush bansal
  • 39
  • 1
  • 1
  • 7

1 Answers1

11

One possible solution is compare in numpy and last convert boolean mask to ints:

df['Result'] = (df.iloc[:, 1:4].values < df[['E']].values).all(axis=1).astype(int)
print (df)
   A   B   C   D   E  Result
0  V  10   5  18  20       1
1  W   9  18  11  13       0
2  X   8   7  12   5       0
3  Y   7   9   7   8       0
4  Z   6   5   3  90       1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thank you so much Jezrael. It worked. I am just wondering why is my code is resulting in memory issue. – piyush bansal May 23 '18 at 11:56
  • 1
    @piyushbansal - Hard question, mainly because big data :) – jezrael May 23 '18 at 11:57
  • @jezrael how to do for the same data, compare B,C,D,E columns and update in result as the count of >0 Ex: for B, C, D, E with values 1,0,5,3 to be Result 3. Could you help? – panda Dec 03 '18 at 09:02
  • 3
    @panda - use `df['Result'] = df[['B','C','D','E']].gt(0).sum(axis=1).astype(int)`, `gt` is `>`, [`gt`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.gt.html) – jezrael Dec 03 '18 at 09:07
  • @jezrael it works. Thank you so much how to make it for greater than or equal to 0 – panda Dec 03 '18 at 09:13
  • 2
    @panda - use `ge` (greater or equal) like `df['Result'] = df[['B','C','D','E']].ge(0).sum(axis=1).astype(int)` – jezrael Dec 03 '18 at 09:13
  • 1
    @jezrael Thank you. – panda Dec 03 '18 at 09:14
  • I wonder if there is anyway to bypass the following error: TypeError: '<' not supported between instances of 'int' and 'str' – Victor Hugo Calegari de Toledo Oct 14 '22 at 20:43
  • 1
    @VictorHugoCalegarideToledo - It means `df['E']` has strings, check [this](https://stackoverflow.com/questions/15891038/change-column-type-in-pandas) for convert column to numeric. – jezrael Oct 17 '22 at 06:22