I am trying to apply the next function in which two datetime64
pandas dataframe columns are arguments:
import datetime
import pandas as pd
def set_dif_months_na(start_date, end_date):
if (pd.isnull(start_date) and pd.notnull(end_date)):
return None
elif (pd.notnull(start_date) and pd.isnull(end_date)):
return None
elif (pd.isnull(start_date) and pd.isnull(end_date)):
return None
else:
start_date = datetime.strptime(start_date, "%d/%m/%Y")
end_date = datetime.strptime(end_date, "%d/%m/%Y")
return abs((end_date.year - start_date.year) * 12 + (end_date.month - start_date.month))
This function is intended to get month difference as integer given two dates as arguments, else it has to return None
.
When I apply it to a new pandas dataframe column as this:
df['new_col'] = [set_dif_months_na(date1, date2)
for date1,date2 in
zip(df['date1'], df['date2'])]
The next error arises:
TypeError: strptime() argument 1 must be str, not Timestamp
How could I adjust the function in order to properly apply it over a new pandas dataframe column?