0

In a dataframe, I want to append a column through an if statement as follows:

death_flag = []
For entry in
range(len(demographics)):
   if pd.isnull(df['DOD'] [entry]) == False:
      if [(df['DOD']-df['DOA'] > pd.Timedelta(days=365) == True)]:
 death_flag.append(1)

Df is a dataframe with 'DOD' and 'DOA' as datetime format. I'm aware that in the dataframe they are considered as series. How do I solve this issue ?

The error keeps showing "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" For this line “ if [(df['DOD']-df['DOA'] > pd.Timedelta(days=365) == True)]:”

Brian Wu
  • 57
  • 1
  • 6

1 Answers1

0

Dont use loops in pandas, if exist super fast vectorized solution like here - create boolean mask for conditions and then create new column by chained masks by & for bitwise AND in DataFrame.loc

m1 = df['DOD'].notna()
m2 = (df['DOD']-df['DOA']) > pd.Timedelta(days=365)

df.loc[m1 & m2, 'new'] = 1

If need new column filled by 0, 1 convert it by Series.view:

df['new'] = (m1 & m2).view('i1')

Or casting to integers:

df['new'] = (m1 & m2).astype('int')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Is there another way to accomplish this without using Bitwise? I really don’t get the relevance of Bitwise in this question. – Brian Wu Jun 22 '20 at 13:29
  • @BrianWu - Not understand, why? You need slow loop solution, not recommended? [link](https://stackoverflow.com/a/55557758/2901002) – jezrael Jun 22 '20 at 13:32
  • m2 = (df['DOD']-df['DOA']) > pd.Timedelta(days=365) . <--This line gives an error as "unsupported operand type(s) for -: 'datetime.date' and 'str'" . How should I fix this ? – Brian Wu Jun 23 '20 at 01:52
  • @Brian Wu Use `df['DOD'] = pd.to_datetime(df['DOD']. astype(str))` and `df['DOA'] = pd.to_datetime(df['DOA'])` before my solution. – jezrael Jun 23 '20 at 04:22