Pandas Dataframe : inplace column substitution vs creating new dataframe with transformed column

Question

Whenever I want to transform an existing column of a dataframe, I tend to use apply/transform which gives me altogether a new series and it does not modify the existing column in the dataframe.

Suppose the following code performs an operation on a column and returns me a series.

new_col1 = df.col1.apply(...)

After this I have two ways of substituting the new series in the dataframe

modifying the existing col1:

df.col1 = new_col1
Or creating a new dataframe with the transformed column:

df.drop(columns=[col1]).join(new_col1)

I ask this because whenever I use mutable data structures in python like lists I always try to create new lists using list comprehension and not by in-place substitution.

Is there any benefit of following this style in case of pandas dataframes ? What's more pythonic and which of the above two approaches do you recommend ?

score 0 · Answer 1 · answered Nov 13 '20 at 17:21

0

Since you are modifying an existing column, the first approach would be faster. Remember that both drop and join returns a copy of new data, so the second approach can be expensive if you have a big data frame with many columns.

answered Nov 13 '20 at 17:21

Quang Hoang

146,074
10
56
74

score 0 · Answer 2 · answered Nov 13 '20 at 18:37

Whenever you want to make changes to the original data frame itself, consider using inplace=True attribute in functions like drop/join which by default returns a new copy.

NOTE: Please keep in mind

cons of inplace,

inplace, contrary to what the name implies, often does not prevent copies from - being created, and (almost) never offers any performance benefits
inplace does not work with method chaining
inplace is a common pitfall for beginners, so removing this option will simplify the API

SOURCE: In pandas, is inplace = True considered harmful, or not?

Pandas Dataframe : inplace column substitution vs creating new dataframe with transformed column

2 Answers2