2

I am trying to create a new column based on conditions from two existing columns, but getting an error after using "np.where", is there any other way to achieve this ?

Input:

change1 change2
yes     yes
yes     no
no      yes
no      yes

Expected Output:

change1 change2 change3
yes      yes      ok
yes      no       not ok
no       yes      not ok
no       yes      not ok

Code:

import pandas as pd
import numpy as np



df1=pd.read_csv('test2.txt',sep='\t')
df1['change1'] = df1['change1'].astype(str)
df1['change2'] = df1['change2'].astype(str)


df['change3'] = np.where(df1['change1']=='yes' & df1['change2'] == 'yes', 'ok', 'not ok')

print(df1)

Error:

cannot compare a dtyped [object] array with a scalar of type [bool]
  • 1
    you are missing the parenthesis, this is called as [operator precedence](https://stackoverflow.com/questions/3328355/python-operator-precedence): `np.where((df['change1']=='yes') & (df['change2'] == 'yes'), 'ok', 'not ok')` – anky Mar 31 '20 at 14:57

3 Answers3

4

Use DataFrame.eq and DataFrame.all. This will help you improve the syntax of code and avoid errors.

df['change3'] = np.where(df.eq('yes').all(axis=1), 'ok' , 'not ok')
#if you need select columns
#df['change3'] = np.where(df[['change1', 'change2']].eq('yes').all(axis=1),
                          'ok' , 'not ok')

without DataFrame.all

df['change3'] = np.where((df1['change1']=='yes') & (df1['change2'] == 'yes'), 
                         'ok', 'not ok')

or

df['change3'] = np.where(df1['change1'].eq('yes') & df1['change2'].eq('yes'), 
                         'ok', 'not ok')

You can also use Series.map / Series.replace

 df['change3'] = df.eq('yes').all(axis=1).map({True : 'ok' , False : 'not ok'})
#df['change3'] = df.eq('yes').all(axis=1).replace({True : 'ok' , False : 'not ok'})

print(df)

#   change1 change2 change3
# 0     yes     yes      ok
# 1     yes      no  not ok
# 2      no     yes  not ok
# 3      no     yes  not ok
ansev
  • 30,322
  • 5
  • 17
  • 31
3

Using DataFrame.replace to convert to binary, then checking all per row:

df1['change3'] = np.where(df1.replace({'yes': 1, 'no': 0}).all(axis=1), 
                          'ok', 
                          'not ok')

Or with replace and sum:

df1['change3'] = np.where(df1.replace({'yes': 1, 'no': 0}).sum(axis=1).gt(1), 
                          'ok', 
                          'not ok')
  change1 change2 change3
0     yes     yes      ok
1     yes      no  not ok
2      no     yes  not ok
3      no     yes  not ok
Erfan
  • 40,971
  • 8
  • 66
  • 78
2

you can use:

df['change3'] = df.apply(lambda x: 'ok' if x['change1'] == x['change2'] else 'not ok', axis=1)

output:

enter image description here

kederrac
  • 16,819
  • 6
  • 32
  • 55
  • https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – ansev Mar 31 '20 at 14:58
  • https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html – kederrac Mar 31 '20 at 15:00
  • why downvoting? – kederrac Mar 31 '20 at 15:16
  • agree with ansev, we shouldn't post cases of apply unless absolutely necessary. some of my early pandas code was bottle knecked massively by answers with `.apply` where it wasn't needed. Given that perhaps I should have know more about vectorisation but it's something we should steer beginners away from. – Umar.H Mar 31 '20 at 15:17
  • 2
    Whoever downvoted this, it's unnecessary, this answer is correct, although `apply` is not perferred. I upvoted. – Erfan Mar 31 '20 at 15:22