0

Whenever I want to transform an existing column of a dataframe, I tend to use apply/transform which gives me altogether a new series and it does not modify the existing column in the dataframe.

Suppose the following code performs an operation on a column and returns me a series.

new_col1 = df.col1.apply(...)

After this I have two ways of substituting the new series in the dataframe

  1. modifying the existing col1:

    df.col1 = new_col1

  2. Or creating a new dataframe with the transformed column:

    df.drop(columns=[col1]).join(new_col1)

I ask this because whenever I use mutable data structures in python like lists I always try to create new lists using list comprehension and not by in-place substitution.

Is there any benefit of following this style in case of pandas dataframes ? What's more pythonic and which of the above two approaches do you recommend ?

Siddhant Tandon
  • 651
  • 4
  • 15

2 Answers2

0

Since you are modifying an existing column, the first approach would be faster. Remember that both drop and join returns a copy of new data, so the second approach can be expensive if you have a big data frame with many columns.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

Whenever you want to make changes to the original data frame itself, consider using inplace=True attribute in functions like drop/join which by default returns a new copy.

NOTE: Please keep in mind

cons of inplace,

  • inplace, contrary to what the name implies, often does not prevent copies from - being created, and (almost) never offers any performance benefits

  • inplace does not work with method chaining

  • inplace is a common pitfall for beginners, so removing this option will simplify the API

    SOURCE: In pandas, is inplace = True considered harmful, or not?

ssp4all
  • 371
  • 2
  • 11