1

In my pandas dataset, there are two columns (i.e. A and B) from which I wish to create a third column (C) that is "True" if either A or B is filled.

I have trialed the following code. However, after running this code, all the values in column C are "No" (indicating all cells in columns A and B are empty, however this is not the case).

df['C'] = C
C = []

for index, row in df.iterrows():
    if df['a'].isnull() is False:
        c.append("Yes")
    elif df['b'].isnull() is False:
        c.append("Yes")
    else:
        c.append("No")

I'm new to Python (and StackOverflow, too), so if anyone has any suggestions they will be most appreciated.

Thank you!

MarianD
  • 13,096
  • 12
  • 42
  • 54
ADZ95
  • 19
  • 3
  • 1
    Why are you calling your dataframe `pd` when `pd` is the universally-accepted alias for the pandas module itself? – roganjosh Feb 08 '20 at 15:24
  • Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on how to ask a good question may also be useful. – yatu Feb 08 '20 at 15:27
  • Read the Pandas docs. Read the Pandas docs. Read the Pandas docs. Read the Pandas docs. – AMC Feb 08 '20 at 17:59
  • Also, why do you want to use the strings `'Yes'` and `'No'`, when boolean values are a thing? – AMC Feb 08 '20 at 18:00
  • Does this answer your question? [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – AMC Feb 08 '20 at 18:03

5 Answers5

2

I would suggest using np.where for the following conditions:

df['c'] = np.where((df['a'].isnull()) & (df['b'].isnull()),"No","Yes")

Of course you would need to import numpy as np

ansev
  • 30,322
  • 5
  • 17
  • 31
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
1

Use DataFrame.notnull with DataFrame.any to check when a or b notnull, then we can use np.where or Series.map to map the boolean series created with Yes or No:

df['c'] = df[['a','b']].notnull().any(axis = 1).map({True:'Yes',False:'No'})

#alternative
#df['c'] = df[['a','b']].notnull().any(axis = 1).replace({True:'Yes',False:'No'})

#logic inverse
#df['c'] = df[['a','b']].notnull().any(axis = 1).map({True:'No',False:'Yes'})

or

import numpy as np
df['c'] = np.where(df[['a','b']].notnull().any(axis = 1),'Yes','No')
#df['c'] = pd.np.where(df[['a','b']].notnull().any(axis = 1),'Yes','No')
#logic inverse
#df['c'] = np.where(df[['a','b']].notnull().any(axis = 1),'No','Yes')
ansev
  • 30,322
  • 5
  • 17
  • 31
0
import pandas as pd

df = pd.DataFrame()
#df is the dataframe you are working with

# using dataframe indexing
df['c'] = (df['a'].isnull() & df['b'].isnull())

# using apply
df['c'] = df.apply(lambda row: False if pd.isna(df['a']) and  pd.isna(df['b']) else True ,axis=1)
Ian
  • 933
  • 12
  • 17
  • 1
    https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – ansev Feb 08 '20 at 15:35
0
def f(x):
     if x[0].isna():
          return "Yes"
     elif x[1].isna():
          return "Yes"
     return "No"

then try :

df['C'] = df[['A','B']].apply(f,axis=1)
Amir.S
  • 719
  • 8
  • 15
0
C = []

for index, row in df.iterrows():

    if(pd.isnull(row['a'])) and (pd.isnull(row['b'])):
        C.append("No")
    else:
        C.append("Yes")

df.insert(2, "C", C)