5

I'm wondering if there's a significant reduction in memory usage when we choose to manipulate a dataframe in-place (compared to not in-place).

I've done a bit of searching on Stack Overflow and came across this post where the answer states that if an operation is not done in-place, a copy of the dataframe is returned (I guess that's a bit obvious when there's an optional parameter called 'inplace' :P).

If I don't need to keep the original dataframe around, it would be beneficial (and logical) to just modify the dataframe in place right?

Context:

I'm trying to get the top element when sorted by a particular 'column' in the dataframe. I was wondering which of these two is more efficient:

in-place:

df.sort('some_column', ascending=0, inplace=1)
top = df.iloc[0]

vs

copy:

top = df.sort('some_column', ascending=0).iloc[0]

For the 'copy' case, it still allocates memory in making the copy when sorting even though I'm not assigning the copy to a variable right? If so, how long does it take to deallocate that copy from memory?

Thanks for any insights in advance!

elllot
  • 365
  • 4
  • 13
  • do it in jupyter and compare execution time via %%time. or choose other tools to measure performance – Rockbar Nov 12 '17 at 06:35
  • I'm more concerned about the memory usage so I'll try out the "python memory profiler." I forgot about that module... I was just wondering if someone could give me a quick conceptual answer. – elllot Nov 12 '17 at 06:53
  • It wasn't fully answered... – elllot Dec 30 '17 at 22:41

1 Answers1

7

In general, there is no difference between inplace=True and returning an explicit copy - in both cases, a copy is created. It just so happens that, in the first case, the data in the copy is copied back into the original df object, so reassignment is not necessary.

Furthermore, note that as of v0.21, df.sort is deprecated, use sort_values instead.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Oh, so both methods make a copy in memory but flagging inplace as True just writes it back to the original df? So would it be accurate to say that setting inplace as False then removing reference for the original variable to explicitly deallocate memory would be essentially the same thing as having inplace set to True? – elllot Nov 14 '17 at 22:47
  • @Ellest Yes to the first question. I didn't understand your second question, because there's no allocation/deallocation going on here. – cs95 Nov 14 '17 at 23:44
  • @COLDSPEED How is a copy created without allocating memory for the copy? In the case of `inplace=False` wouldn't the original copy not be deallocated from memory since a variable will still be referencing that dataframe object? i.e. if we don't sort in place: `df = DataFrame(...); df_sorted = df.sort(...,inplace=False)` we're left with df and df_sorted both taking up space. However, if we sort in place: `df = DataFrame(...); df.sort(...,inplace=True)` we're left with a single dataframe. – elllot Nov 15 '17 at 02:13
  • @Ellest in the latter case the extra df is garbage collected. – cs95 Mar 14 '19 at 14:54