compare multiple columns of pandas dataframe with one column

Question

I have a dataframe: df-

    A   B   C   D   E
0   V   10  5   18  20
1   W   9   18  11  13
2   X   8   7   12  5
3   Y   7   9   7   8
4   Z   6   5   3   90

I want to add a column 'Result' which should return 1 if the value in column 'E' is greater than the values in B, C & D columns else return 0.

Output should be:

    A   B   C   D   E   Result
0   V   10  5   18  20  1
1   W   9   18  11  13  0
2   X   8   7   12  5   0
3   Y   7   9   7   8   0
4   Z   6   5   3   90  1

For few columns, i would use logic like : if(and(E>B,E>C,E>D),1,0), But I have to compare around 20 columns (from B to U) with column name 'V'. Additionally, the dataframe has around 100 thousand rows.

I am using

df['Result']=np.where((df.ix[:,1:20])<df['V']).all(1),1,0)

And it gives a Memory error.

I am really curious if my solution is working, can you test it? — jezrael, May 23 '18 at 10:37

score 11 · Accepted Answer · answered May 23 '18 at 10:23

11

One possible solution is compare in numpy and last convert boolean mask to ints:

df['Result'] = (df.iloc[:, 1:4].values < df[['E']].values).all(axis=1).astype(int)
print (df)
   A   B   C   D   E  Result
0  V  10   5  18  20       1
1  W   9  18  11  13       0
2  X   8   7  12   5       0
3  Y   7   9   7   8       0
4  Z   6   5   3  90       1

answered May 23 '18 at 10:23

jezrael

822,522
95
1,334
1,252

1

Thank you so much Jezrael. It worked. I am just wondering why is my code is resulting in memory issue. – piyush bansal May 23 '18 at 11:56
1

@piyushbansal - Hard question, mainly because big data :) – jezrael May 23 '18 at 11:57
@jezrael how to do for the same data, compare B,C,D,E columns and update in result as the count of >0 Ex: for B, C, D, E with values 1,0,5,3 to be Result 3. Could you help? – panda Dec 03 '18 at 09:02
3

@panda - use `df['Result'] = df[['B','C','D','E']].gt(0).sum(axis=1).astype(int)`, `gt` is `>`, [`gt`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.gt.html) – jezrael Dec 03 '18 at 09:07
@jezrael it works. Thank you so much how to make it for greater than or equal to 0 – panda Dec 03 '18 at 09:13
2

@panda - use `ge` (greater or equal) like `df['Result'] = df[['B','C','D','E']].ge(0).sum(axis=1).astype(int)` – jezrael Dec 03 '18 at 09:13
1

@jezrael Thank you. – panda Dec 03 '18 at 09:14
I wonder if there is anyway to bypass the following error: TypeError: '<' not supported between instances of 'int' and 'str' – Victor Hugo Calegari de Toledo Oct 14 '22 at 20:43
1

@VictorHugoCalegarideToledo - It means `df['E']` has strings, check [this](https://stackoverflow.com/questions/15891038/change-column-type-in-pandas) for convert column to numeric. – jezrael Oct 17 '22 at 06:22

compare multiple columns of pandas dataframe with one column

1 Answers1