Much has been written about the downsides of using inplace=True
when working on data frames, but am I mistaken to assume that renaming columns with inplace=True is benign?
Do any data blocks get copied and discarded when I do
df.rename(columns={'old': 'new'}, inplace=True)
Simple timing tests show that renaming columns in place is faster than assigning a copy:
df = df.rename(columns={'old': 'new'})
Much faster actually for large data frames. The time renaming in place takes does not depend on the size of the dataframe.
import numpy as np
import pandas as pd
import time
# np.random.seed(0)
df = pd.DataFrame(np.random.rand(10**6,5), columns=list('abcde'))
d1 = dict(zip(list('abcde'),list('ABCDE')))
d2 = dict(zip(list('ABCDE'),list('abcde')))
t0 = time.perf_counter()
for i in range(10):
df.rename(columns=d1, inplace=True)
df.rename(columns=d2, inplace=True)
t1 = time.perf_counter()
for i in range(10):
df = df.rename(columns=d1)
df = df.rename(columns=d2)
t2 = time.perf_counter()
print('inplace : ', t1-t0)
print('df = df : ', t2-t1)
I am using Python 3.9.6 and Pandas 1.3.1. under Win10 and get:
inplace : 0.003490000000000215
df = df : 0.1703701999999998
Can I conclude that no copies are made behind the scenes?