0

I'm trying to create a new column in a DataFrame that comes from a CSV file. What makes a this little bit tricky is that the values from this new column depends on conditions from other columns from the DataFrame.

The output column depends on the values from the following columns from this dataframe: VaccineCode | Occurrence | VaccineN | firstVaccineDate

So if the condition is met for a specific vaccine, I have to sum the respective date from the ApplicationDate column, in order to tell the vaccine date of the second dose.

My code:

import pandas as pd
import datetime
from datetime import timedelta, date, datetime

df = pd.read_csv(path_csv, engine='python', sep=';')

criteria_Astrazeneca = (df.VaccineCode == 85) & (df.Occurrence == 1) & (df.VaccineN == 1)
criteria_Pfizer = (df.VaccineCode == 86) & (df.Occurrence == 1) & (df.VaccineN == 1)
criteria_CoronaVac = (df.VaccineCode == 87) & (df.Occurrence == 1) & (df.VaccineN == 1)

days_pfizer = 56
days_coronaVac = 28
days_astraZeneca = 84

What I've tried so far:

df['New_Column'] = df[criteria_CoronaVac].firstVaccineDate + timedelta(days=days_coronaVac)

This works until the point that I have to complete the same New_Column with the others results, like this:

df['New_Column'] = df[criteria_CoronaVac].firstVaccineDate + timedelta(days=days_coronaVac)
df['New_Column'] = df[criteria_Pfizer].firstVaccineDate + timedelta(days=days_pfizer)
df['New_Column'] = df[criteria_AstraZeneca].firstVaccineDate + timedelta(days=days_astraZeneca)

Naturally, the problem with this approach comes from the fact that the next statement overwrites those before, so I end up just with the New_Column filled with the results that came from the last statement. I need a way to put all results in the same column.

My last try was:

df['New_Column'] = df[criteria_CoronaVac].firstVaccineDate + timedelta(days=days_coronaVac)
df[criteria_Pfizer].loc[:,'New_Column'] = df[criteria_Pfizer].firstVaccineDate + timedelta(days=days_pfizer)

But it gives the following error:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
    self._setitem_single_column(ilocs[0], value, pi)
  • 1
    Does this answer your question? [How to deal with SettingWithCopyWarning in Pandas](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – ddejohn Nov 05 '21 at 19:55
  • 1
    https://stackoverflow.com/a/53954986/6298712 – ddejohn Nov 05 '21 at 19:55

2 Answers2

1

Thank you very much @ddejohn, the first link helped me to solve my problem as follows:

df['New_Column'] = df[criteria_CoronaVac].firstVaccineDate + timedelta(days=days_coronaVac)
df.loc[criteria_Pfizer,'New_Column'] = df[criteria_Pfizer].firstVaccineDate + timedelta(days=days_pfizer)
df.loc[criteria_Astrazeneca,'New_Column'] = df[criteria_Astrazeneca].firstVaccineDate + timedelta(days=days_astraZeneca)

That way, the first statement create the column and fill with the coronavac indexes and the next ones fill the same column just in the respective indexes. Problem solved, thanks again.

0

You could also use an data frame transform to create a new rule

David Miró
  • 2,694
  • 20
  • 20
Golden Lion
  • 3,840
  • 2
  • 26
  • 35