I'm trying to create a new column in a DataFrame that comes from a CSV file. What makes a this little bit tricky is that the values from this new column depends on conditions from other columns from the DataFrame.
The output column depends on the values from the following columns from this dataframe: VaccineCode
| Occurrence
| VaccineN
| firstVaccineDate
So if the condition is met for a specific vaccine, I have to sum the respective date from the ApplicationDate
column, in order to tell the vaccine date of the second dose.
My code:
import pandas as pd
import datetime
from datetime import timedelta, date, datetime
df = pd.read_csv(path_csv, engine='python', sep=';')
criteria_Astrazeneca = (df.VaccineCode == 85) & (df.Occurrence == 1) & (df.VaccineN == 1)
criteria_Pfizer = (df.VaccineCode == 86) & (df.Occurrence == 1) & (df.VaccineN == 1)
criteria_CoronaVac = (df.VaccineCode == 87) & (df.Occurrence == 1) & (df.VaccineN == 1)
days_pfizer = 56
days_coronaVac = 28
days_astraZeneca = 84
What I've tried so far:
df['New_Column'] = df[criteria_CoronaVac].firstVaccineDate + timedelta(days=days_coronaVac)
This works until the point that I have to complete the same New_Column
with the others results, like this:
df['New_Column'] = df[criteria_CoronaVac].firstVaccineDate + timedelta(days=days_coronaVac)
df['New_Column'] = df[criteria_Pfizer].firstVaccineDate + timedelta(days=days_pfizer)
df['New_Column'] = df[criteria_AstraZeneca].firstVaccineDate + timedelta(days=days_astraZeneca)
Naturally, the problem with this approach comes from the fact that the next statement overwrites those before, so I end up just with the New_Column
filled with the results that came from the last statement. I need a way to put all results in the same column.
My last try was:
df['New_Column'] = df[criteria_CoronaVac].firstVaccineDate + timedelta(days=days_coronaVac)
df[criteria_Pfizer].loc[:,'New_Column'] = df[criteria_Pfizer].firstVaccineDate + timedelta(days=days_pfizer)
But it gives the following error:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)