0

I have several dataframes with indices in common (a list of countries) which I am iterating over to perform some manipulations. I have read this answer and know that iterating over dataframes isn't ideal - I have vectorised as much as I can, but the manipulations are somewhat complex, involving comparing rows of different dataframes and transforming them separately with custom algorithms, so some iteration seems unavoidable.

The basic workflow is:

for i in index:
    row1 = df1.loc[i]
    row2 = df2.loc[i]
    row3 = df3.loc[i]

    (row1, row2, row3).do.some.comparisons
    row1 = row1.apply.some.transformations
    row2 = row2.apply.some.algorithms
    row3 = row3.some.other.algorithms

I would like to get the dataframes back at the end with the new values correctly assigned to each row.

If I end the for block with:

    df1.loc[i] = row1
    df2.loc[i] = row2
    df3.loc[i] = row3

then I get a SettingWithCopyWarning. Looking into this, it seems my code has exactly the structure that the Pandas documentation warns against here (Yikes!).

What's the best way to get around this problem? How do I reliably get my dataframes back with the new values in them?

TY Lim
  • 509
  • 1
  • 3
  • 11

1 Answers1

0

Generally I would advise against looping. But in the case you have to, you should not perform assignments on row1, etc:

for i in index:
    row1 = df1.loc[i]
    row2 = df2.loc[i]
    row3 = df3.loc[i]

    (row1, row2, row3).do.some.comparisons
    df1.loc[i] = row1.apply.some.transformations
    df2.loc[i] = row2.apply.some.algorithms
    df3.loc[i] = row3.some.other.algorithms
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74