0

Let there be several similar dataframes that an operation is to be performed on, e.g. dropping or renaming columns. One may want to do it in a loop:

this = pd.DataFrame({'text': ['Hello World']})
that = pd.DataFrame({'text': ['Hello Gurl']})

for df in [this, that]:
    df = df.rename(columns={'text': 'content'})

No exception is raised, however, the dataframes remain unchanged. Why is that and how can I iterate over dataframes without having to type the same line of code dozens of times?

On other hand, operations like creating new columns do work:

for df in [this, that]:
    df['content'] = df.text
Zwiebak
  • 344
  • 1
  • 11
  • inside your loop in the first case, you change what the name `df` references not the reference as you do in the second case – JonSG Apr 12 '23 at 20:04

4 Answers4

0

Call .rename() with inplace=True to have it modify the DF itself.

this = pd.DataFrame({'text': ['Hello World']})
that = pd.DataFrame({'text': ['Hello Gurl']})

for df in [this, that]:
    df.rename(columns={'text': 'content'}, inplace=True)

As to "why it's not modified", it's similar to, say,

this = ("foo",)
that = ("bar",)

for x in (this, that):
    x = x + ("blarp",)

not assigning ("foo", "blarp") and ("bar", "blarp") back to this and that.

AKX
  • 152,115
  • 15
  • 115
  • 172
0

Because df.rename returns a new data frame. This is also the case with a lot of pandas's functions. Add inplace=true:

for df in [this, that]:
    df.rename(columns={'text': 'content'}, inplace=True)
Code Different
  • 90,614
  • 16
  • 144
  • 163
0

If you want to rename your columns inplace, you can use rename method with inplace=True as parameter but you can also rename directly the Index because it's not a method that returns a copy:

d = {'text': 'content'}

for df in [this, that]:
    df.columns = [d.get(col, col) for col in df.columns]

Output:

>>> this
       content
0  Hello World

>>> that
      content
0  Hello Gurl
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

As other answers have mentioned, rename returns a copy, and the original DataFrame is not changed. And since you are creating a temporarily list on the fly, there is no way to get the updated results back once the loop is done.

inplace=True is harmful in my opinion.

So don't use it. Some answers have suggested using a list/dict, a small change to your code is in order:

dfs = [this, that]
for i in range(len(dfs)):
    dfs[i] = dfs[i].rename(...) # do something with dfs[i] and assign it back
# unpack the result
this, that = dfs

this works because the result of the rename operation is assigned back to the list that you have a reference to.

cs95
  • 379,657
  • 97
  • 704
  • 746