1

I am trying to update a set of pandas dataframes with results of some calculations stored in a dataframe. I created the following loop to do this. This seems to work within the loop, but I find that the original dataframe is not updated after the loop is complete!

please can you tell me where I am going wrong? I am using python 3.7.1 and pandas 1.0.5 on Windows 10 machine.

z_score_list = ['LVESV_i', 'LVEDV_i', 'LVSV_i', 'LV_mass_i', 'RVEDV_i', 'RVESV_i', 'RVSV_i'] # columns used for calcuation
df_list = [t1df, t1vsd_df, t1highshunt_df, t1preTVcases_df] #list of dfs to update
print('Before loop shape: ', t1df.shape)
for i, df in enumerate(df_list):
    print('before update =', df.shape)
    df_z = df[z_score_list]
    df_z = calc_Z_scores(df_z,merge=False) # function returns calculated Z-scores in a dataframe
    df = df.merge(df_z, on = df.index, how='inner') # here I merge them
    df.drop(columns = 'key_0', inplace=True) # drop the additional index
    # df.head()
    print('after update = ', df.shape)
    del(df_z)
    # df = df.copy(deep=True) - tried this, but does not work

print('After loop shape: ', t1df.shape)

Here is the output:

Before loop shape:  (63, 55)
before = (63, 55)
after =  (63, 62)
before = (8, 55)
after =  (8, 62)
before = (30, 54)
after =  (30, 61)
before = (55, 55)
after =  (55, 62)
After loop shape:  (63, 55)

1 Answers1

0

I think on the line which has this comment "# here I merge them" you are getting a new reference out of merge and assigning it to the df reference that is where the reference to t1df is lost. Try merging in-place or print the address/hash of df using hash function at that line and see.

  • Thanks for this @Animesh. Would you explain this answer a bit more? I tried merging in place using `df_list[i] = df` but this would not work either. How can you get the address of the dataframe? – senna_ananth Jul 18 '20 at 21:58
  • You can print it via the print statement. – Animesh Mukherkjee Jul 18 '20 at 22:28
  • There is no in place `merge` in pandas or python! Any ideas to get this done?? Surely this has been done before! – senna_ananth Jul 19 '20 at 05:34
  • Print all dataframes using print statement , like print(t1df) outside for loop and print(df) inside for loop. See if you get same address. If not towards the end do ```df_list[i] = df.copy()``` – Animesh Mukherkjee Jul 19 '20 at 05:43
  • Thanks! I did that but the results are no different. Here is the print out of the addresses as you suggested. Interesting that the addresses change even of the original dataframes! Coming from a C++ background, I find this difficult to understand - pointers were hard, but at least you could do what you wanted! `Before loop shape: (63, 55) address = 0x17fe9430 before = (63, 55) address = 0x970e970 after = (63, 62) address = 0x150e1610 ... After loop shape: (63, 55) address = 0x1312e10` – senna_ananth Jul 19 '20 at 06:22
  • https://stackoverflow.com/questions/49986865/modifying-dataframes-inside-a-list-is-not-working... I would say a better solution would be to put the logic inside the looping in a function and then use a list comprehension and unpacking to correctly reassign respective data frames. – Animesh Mukherkjee Jul 19 '20 at 06:27
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/218127/discussion-between-animesh-mukherkjee-and-senna-ananth). – Animesh Mukherkjee Jul 19 '20 at 06:34