This has been discussed before, but with conflicting answers:
What I'm wondering is:
- Why is
inplace = False
the default behavior? - When is it good to change it? (well, I'm allowed to change it, so I guess there's a reason).
- Is this a safety issue? that is, can an operation fail/misbehave due to
inplace = True
? - Can I know in advance if a certain
inplace = True
operation will "really" be carried out in-place?
My take so far:
- Many Pandas operations have an
inplace
parameter, always defaulting toFalse
, meaning the original DataFrame is untouched, and the operation returns a new DF. - When setting
inplace = True
, the operation might work on the original DF, but it might still work on a copy behind the scenes, and just reassign the reference when done.
pros of inplace = True
:
- Can be both faster and less memory hogging (the first link shows
reset_index()
runs twice as fast and uses half the peak memory!).
pros of inplace = False
:
- Allows chained/functional syntax:
df.dropna().rename().sum()...
which is nice, and offers a chance for lazy evaluation or a more efficient re-ordering (though I don't think Pandas is doing this). - When using
inplace = True
on an object which is potentially a slice/view of an underlying DF, Pandas has to do aSettingWithCopy
check, which is expensive.inplace = False
avoids this. - Consistent & predictable behavior behind the scenes.
So, putting the copy-vs-view issue aside, it seems more performant to always use inplace = True
, unless specifically writing a chained statement. But that's not the default Pandas opt for, so what am I missing?