how to populate a new column from conditions based on two existing columns , in Pandas?

Question

I am trying to create a new column based on conditions from two existing columns, but getting an error after using "np.where", is there any other way to achieve this ?

Input:

change1 change2
yes     yes
yes     no
no      yes
no      yes

Expected Output:

change1 change2 change3
yes      yes      ok
yes      no       not ok
no       yes      not ok
no       yes      not ok

Code:

import pandas as pd
import numpy as np



df1=pd.read_csv('test2.txt',sep='\t')
df1['change1'] = df1['change1'].astype(str)
df1['change2'] = df1['change2'].astype(str)


df['change3'] = np.where(df1['change1']=='yes' & df1['change2'] == 'yes', 'ok', 'not ok')

print(df1)

Error:

cannot compare a dtyped [object] array with a scalar of type [bool]

you are missing the parenthesis, this is called as [operator precedence](https://stackoverflow.com/questions/3328355/python-operator-precedence): `np.where((df['change1']=='yes') & (df['change2'] == 'yes'), 'ok', 'not ok')` — anky, Mar 31 '20 at 14:57

ansev · Answer 1 · 2020-03-31T15:16:27.833

Use DataFrame.eq and DataFrame.all. This will help you improve the syntax of code and avoid errors.

df['change3'] = np.where(df.eq('yes').all(axis=1), 'ok' , 'not ok')
#if you need select columns
#df['change3'] = np.where(df[['change1', 'change2']].eq('yes').all(axis=1),
                          'ok' , 'not ok')

without DataFrame.all

df['change3'] = np.where((df1['change1']=='yes') & (df1['change2'] == 'yes'), 
                         'ok', 'not ok')

or

df['change3'] = np.where(df1['change1'].eq('yes') & df1['change2'].eq('yes'), 
                         'ok', 'not ok')

You can also use Series.map / Series.replace

 df['change3'] = df.eq('yes').all(axis=1).map({True : 'ok' , False : 'not ok'})
#df['change3'] = df.eq('yes').all(axis=1).replace({True : 'ok' , False : 'not ok'})

print(df)

#   change1 change2 change3
# 0     yes     yes      ok
# 1     yes      no  not ok
# 2      no     yes  not ok
# 3      no     yes  not ok

score 3 · Answer 2 · answered Mar 31 '20 at 14:58

Using DataFrame.replace to convert to binary, then checking all per row:

df1['change3'] = np.where(df1.replace({'yes': 1, 'no': 0}).all(axis=1), 
                          'ok', 
                          'not ok')

Or with replace and sum:

df1['change3'] = np.where(df1.replace({'yes': 1, 'no': 0}).sum(axis=1).gt(1), 
                          'ok', 
                          'not ok')

  change1 change2 change3
0     yes     yes      ok
1     yes      no  not ok
2      no     yes  not ok
3      no     yes  not ok

I like your example with replace and sum , +1 – kederrac Mar 31 '20 at 15:28 — kederrac, Mar 31 '20 at 15:28

kederrac · Answer 3 · 2020-03-31T17:12:13.760

2

you can use:

df['change3'] = df.apply(lambda x: 'ok' if x['change1'] == x['change2'] else 'not ok', axis=1)

output:

edited Mar 31 '20 at 17:12

answered Mar 31 '20 at 14:58

kederrac

16,819
6
32
55

https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – ansev Mar 31 '20 at 14:58
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html – kederrac Mar 31 '20 at 15:00
why downvoting? – kederrac Mar 31 '20 at 15:16
agree with ansev, we shouldn't post cases of apply unless absolutely necessary. some of my early pandas code was bottle knecked massively by answers with `.apply` where it wasn't needed. Given that perhaps I should have know more about vectorisation but it's something we should steer beginners away from. – Umar.H Mar 31 '20 at 15:17
2

Whoever downvoted this, it's unnecessary, this answer is correct, although `apply` is not perferred. I upvoted. – Erfan Mar 31 '20 at 15:22

how to populate a new column from conditions based on two existing columns , in Pandas?

3 Answers3