0

Much has been written about the downsides of using inplace=True when working on data frames, but am I mistaken to assume that renaming columns with inplace=True is benign? Do any data blocks get copied and discarded when I do

df.rename(columns={'old': 'new'}, inplace=True)

Simple timing tests show that renaming columns in place is faster than assigning a copy:

df = df.rename(columns={'old': 'new'})

Much faster actually for large data frames. The time renaming in place takes does not depend on the size of the dataframe.

import numpy as np
import pandas as pd
import time

# np.random.seed(0)
df = pd.DataFrame(np.random.rand(10**6,5), columns=list('abcde'))
d1 = dict(zip(list('abcde'),list('ABCDE')))
d2 = dict(zip(list('ABCDE'),list('abcde')))
t0 = time.perf_counter()
for i in range(10):
    df.rename(columns=d1, inplace=True)
    df.rename(columns=d2, inplace=True)
t1 = time.perf_counter()
for i in range(10):
    df = df.rename(columns=d1)
    df = df.rename(columns=d2)
t2 = time.perf_counter()
print('inplace :  ', t1-t0)
print('df = df :  ', t2-t1)

I am using Python 3.9.6 and Pandas 1.3.1. under Win10 and get:

inplace :   0.003490000000000215
df = df :   0.1703701999999998

Can I conclude that no copies are made behind the scenes?

Martin R
  • 219
  • 1
  • 2
  • 7
  • https://stackoverflow.com/questions/45570984/in-pandas-is-inplace-true-considered-harmful-or-not – lummers Sep 09 '22 at 22:56
  • @lummers: As I said, much has been written. The most relevant post to my question is 8 years old and possibly obsolete: https://stackoverflow.com/questions/22532302/pandas-peculiar-performance-drop-for-inplace-rename-after-dropna?noredirect=1&lq=1 And it does not answer the question. I would think column labels are metadata and editing should not require memory allocations for the data blocks. – Martin R Sep 10 '22 at 01:49

1 Answers1

0

The time renaming in place takes does not depend on the size of the dataframe. Can I conclude that no copies are made behind the scenes?

Yes, you can conclude that, except that a copy of the column names series may be made. Obviously the performance of that should be immaterial as the number of columns is usually not huge.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436