Whenever I want to transform an existing column of a dataframe, I tend to use apply/transform
which gives me altogether a new series and it does not modify the existing column in the dataframe.
Suppose the following code performs an operation on a column and returns me a series.
new_col1 = df.col1.apply(...)
After this I have two ways of substituting the new series in the dataframe
modifying the existing
col1
:df.col1 = new_col1
Or creating a new dataframe with the transformed column:
df.drop(columns=[col1]).join(new_col1)
I ask this because whenever I use mutable data structures in python like lists I always try to create new lists using list comprehension and not by in-place substitution.
Is there any benefit of following this style in case of pandas dataframes ? What's more pythonic and which of the above two approaches do you recommend ?