4

I have a pandas dataframe with two columns as following:

A      B
Yes    No
Yes    Yes
No     Yes
No     No
NA     Yes
NA     NA

I want to create a new column based on these values such that if any of the column values are Yes, the value in the new column should also be Yes. If both columns have the value No, the new column would also have the value No. And finally, if both columns has value NA, the output would also have NA for the new column. Example output for above data is:

C
Yes
Yes
Yes
No
Yes
NA

I wrote a loop over the length of dataframe and then checks for each value to get a new column. However, it takes a long time for 10M records. Is there a faster pythonic way to achieve this?

Haroon S.
  • 2,533
  • 6
  • 20
  • 39
  • This seems like extremely basic Pandas functionality, what is particular about this situation which means it cannot be resolved based on the vast amounts of information available on the subject? Please see [ask], [help/on-topic]. As an aside, are you really using the strings `"Yes"`/`"No"` instead of actual boolean values? – AMC May 01 '20 at 23:55
  • Never mind the fact this is basically a duplicate of https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column, of course. – AMC May 01 '20 at 23:58

3 Answers3

7

Something like

df.fillna('').max(axis=1)
Out[106]: 
0    Yes
1    Yes
2    Yes
3     No
4    Yes
5       
dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
2

Try:

(df == 'Yes').eval('A | B').astype(str).mask(df['A'].isna() & df['B'].isna())
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
0

Another way of doing it. Hard corded though

conditions=((df['A']=='Yes')|(df['B']=='Yes'),(df['A']=='No')&(df['B']=='No'),(df['A']=='NaN')&(df['B']=='NaN'))
choicelist=('Yes','No','NaN')
df['C']=np.select(conditions, choicelist)
df

enter image description here

wwnde
  • 26,119
  • 6
  • 18
  • 32