I have one data frame called patient_df that is made like this:
PATIENT_COLS = ['Origin', 'Status', 'Team', 'Bed', 'Admit_Time', 'First_Consult', 'Decant_Time', 'Ward_Time', 'Discharge_Order', 'Discharged'] # data to track for each patient
patient_df = pd.DataFrame(columns=PATIENT_COLS)
Then, at multiple points in my code I will access a row of this data frame and update fields associated with it (the row at patient_ID doesn't exist prior to me creating it in the first line):
patient_df.loc[patient_ID] = [None for i in range(NUM_PATIENT_COLS)]
record = patient_df.loc[patient_ID]
record.Origin = ORIGIN()
record.Admit_Time = sim_time
This code runs perfectly with no errors or warnings and the output is as expected (the actual data frame is updated).
However, I have another data frame called ip_df:
ip_df = pd.read_csv(PATH + 'Clean_IP.csv')
Now, when I try to access the rows in the same way (this time the rows already exist):
for patient in ALC_patients:
record = ip_df.loc[patient]
orig_end = record.IP_Discharge_DT
record.IP_LOS = MAX_STAY
record.IP_Discharge_DT = record.N_Left_DT + timedelta(days=MAX_STAY)
I get
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
Now, I realize what's happening is I'm actually accessing a copy of the data frame and thus not updating the actual one, and I can fix this by using
ip_df[patient, 'IP_LOS'] = MAX_STAY
However, I find the first code much cleaner, plus I don't have to make the data frame search for the row again every time. Why is this working with patient_df but not for ip_df, and is there anything I can change to use code more like what I am for patient_df?