169

In the pandas library many times there is an option to change the object inplace such as with the following statement...

df.dropna(axis='index', how='all', inplace=True)

I am curious what is being returned as well as how the object is handled when inplace=True is passed vs. when inplace=False.

Are all operations modifying self when inplace=True? And when inplace=False is a new object created immediately such as new_df = self and then new_df is returned?


If you are trying to close a question where someone should use inplace=True and hasn't, consider replace() method not working on Pandas DataFrame instead.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Aran Freel
  • 3,085
  • 5
  • 29
  • 42
  • 20
    Yes, `inplace=True` returns `None` `inplace=False` returns a copy of the object with the operation performed. The docs are pretty clear on this, is there something that is confusing with a specific part? Spefically `If True, do operation inplace and return None.` – EdChum May 10 '17 at 13:09
  • I am subclassing the DataFrame object and with an operation such as merge it doesn't seem possible to do it inplace... `self = self.merge(new_df, how='left', on='column2'` I am not sure that it is possible to reassign self – Aran Freel May 10 '17 at 13:12
  • 1
    You're correct that [DataFrame.merge](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) has no `inplace` argument. It returns a DataFrame, so no issue reassigning. – JAV May 11 '17 at 05:01
  • 1
    Can someone also highlight the advantages of using it in terms of resource consumption? – markroxor Apr 03 '19 at 12:03
  • I've definitely seen on SO or another site someone writing a pretentious-sounding post beginning with "**inplace=True does not mean what you think it means**" (emphasis theirs). I came looking for that post but I don't see any major warnings from the community. So I take it that it's pretty much safe to use `inplace=True` when we would otherwise assign it back to the same variable? – Excel Help Sep 06 '19 at 11:03
  • 2
    @markroxor There really aren't many. In few instances, `inplace` action can be a little faster since you don't actually have to return a copy of the result. But that's about it. There are way more reasons not to use it. – cs95 Dec 09 '19 at 03:54
  • But then there is this: https://www.dataschool.io/future-of-pandas/#inplace – PatrickT Jan 08 '22 at 07:00

11 Answers11

132

When inplace=True is passed, the data is renamed in place (it returns nothing), so you'd use:

df.an_operation(inplace=True)

When inplace=False is passed (this is the default value, so isn't necessary), performs the operation and returns a copy of the object, so you'd use:

df = df.an_operation(inplace=False) 
cs95
  • 379,657
  • 97
  • 704
  • 746
Ed Harrod
  • 3,423
  • 4
  • 32
  • 53
  • Would I be right in thinking that `inplace` is only an option for methods which alter existing data, but not for methods which 'reshape' the data. For instance, I can .set_index(inplace=True) as this applies values to the existing index, but can't .reindex(inplace=True) because this could create extra rows on the DataFrame that didn't exist in the previous array? – ac24 Mar 13 '18 at 22:49
  • 4
    The method `.dropna()` accepts `inplace=True` and can most definitely reshape the dataframe, so no. – gosuto Aug 26 '18 at 13:46
  • 3
    You have to be careful here. @ac24 is actually more or less right. While `dropna` returns a dataframe of different shape, it doesn’t actually reshape the underlying data — it merely returns a mask over it (when `inplace=False`), which can lead to the dreaded `SettingWithCopyWarning`. Only when there are no more references to the old array of values will pandas reshape according to the mask. A better rule of thumb is: `inplace` is available when the operation doesn’t require allocating a new backing ndarray of values. – BallpointBen Feb 27 '19 at 05:08
  • After the `df=df.an_operation` operation, the old dataframe does not take up space in RAM, does it ? – Bushmaster Mar 30 '22 at 07:21
121

In pandas, is inplace = True considered harmful, or not?

TLDR; Yes, yes it is.

  • inplace, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefits
  • inplace does not work with method chaining
  • inplace can lead to SettingWithCopyWarning if used on a DataFrame column, and may prevent the operation from going though, leading to hard-to-debug errors in code

The pain points above are common pitfalls for beginners, so removing this option will simplify the API.


I don't advise setting this parameter as it serves little purpose. See this GitHub issue which proposes the inplace argument be deprecated api-wide.

It is a common misconception that using inplace=True will lead to more efficient or optimized code. In reality, there are absolutely no performance benefits to using inplace=True. Both the in-place and out-of-place versions create a copy of the data anyway, with the in-place version automatically assigning the copy back.

inplace=True is a common pitfall for beginners. For example, it can trigger the SettingWithCopyWarning:

df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})

df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame

Calling a function on a DataFrame column with inplace=True may or may not work. This is especially true when chained indexing is involved.

As if the problems described above aren't enough, inplace=True also hinders method chaining. Contrast the working of

result = df.some_function1().reset_index().some_function2()

As opposed to

temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()

The former lends itself to better code organization and readability.


Another supporting claim is that the API for set_axis was recently changed such that inplace default value was switched from True to False. See GH27600. Great job devs!

cs95
  • 379,657
  • 97
  • 704
  • 746
  • 5
    Sure `inplace=True` doesn't work with chaining etc. but that's obvious is you understand what's it doing conceptually. Personally I find it a little cleaner to avoid assignment- Would you also be in favour of removing `list.sort` etc. from the standard library? – Chris_Rands Dec 10 '19 at 12:43
  • 8
    I don't think that's a fair comparison. There are some obvious benefits of using list.sort versus sorted. Same goes with the other in place functions. There is no real benefit here, method chaining is a lot more common in pandas and there are plans for this argument's deprecation anyway. – cs95 Dec 10 '19 at 15:02
  • 1
    I also find it a little cleaner to avoid assignment: also, for example, python's `list.append()` is also in-place, while pandas df.append isn't (and in does not even support inplace), which irritates me to no end. Which is why I'd like to know, just to understand what real benefits are - what are the obvious benefits of using list.sort versus sorted, other than avoiding assignment? Otherwise, I think there is real benefit here - me being able to avoid assignment, where I personally find it more readable. – sdbbs Jun 26 '20 at 15:54
  • 1
    @sdbbs `list.append()` appends to an existing list. `df.append` makes a copy of your data (doesn't matter of you have 5 rows or 5 million), then adds a new row to your copy, then returns it. What do you think makes more sense? As for df.append, [AVOID AS MUCH AS POSSIBLE](https://stackoverflow.com/questions/13784192/creating-an-empty-pandas-dataframe-then-filling-it/56746204#56746204). I don't think it's a good example to argue for inplace=True, I don't even think that function has a place in the API. – cs95 Jun 26 '20 at 18:58
  • 1
    Good answer! Can you please clarify one moment: first you told "and (almost) never offers any performance benefits". It looks like there are moment when it offers benefits but it is rare case. But later you told "absolutely no performance benefits" So all the same there are sometimes situations when `inplace` increases efficiency? – Mikhail_Sam Jun 09 '21 at 07:13
  • 2
    One more question about memory consumption. This answer from thread https://stackoverflow.com/a/59335777/4960953 says that `inlace` is more efficient in case of memory. Is it true (I have doubts about this)? – Mikhail_Sam Jun 09 '21 at 07:17
  • 5
    The reason why people use inplace=True is because the pandas api is unintuitive in the sense that "inplace" operation is the expected/desired behavior.. for example why should the simple case of renaming columns or resetting the index return a new object? No one wants to reassign these operations back to the same variable over and over again, it looks awful. – Jared Marks Nov 08 '21 at 10:55
  • I would like to echo above 2 comments. If you are working with large dataframe, creating soft copies for renaming columns and resetting index is not memeory efficient and why would renaming columns and resetting index of a dataframe return a new dataframe and keep the "original" dataframe. – camel_case Nov 28 '22 at 17:20
51

The way I use it is

# Have to assign back to dataframe (because it is a new copy)
df = df.some_operation(inplace=False) 

Or

# No need to assign back to dataframe (because it is on the same copy)
df.some_operation(inplace=True)

CONCLUSION:

 if inplace is False
      Assign to a new variable;
 else
      No need to assign
Nabin
  • 11,216
  • 8
  • 63
  • 98
7

The inplace parameter:

df.dropna(axis='index', how='all', inplace=True)

in Pandas and in general means:

1. Pandas creates a copy of the original data

2. ... does some computation on it

3. ... assigns the results to the original data.

4. ... deletes the copy.

As you can read in the rest of my answer's further below, we still can have good reason to use this parameter i.e. the inplace operations, but we should avoid it if we can, as it generate more issues, as:

1. Your code will be harder to debug (Actually SettingwithCopyWarning stands for warning you to this possible problem)

2. Conflict with method chaining


So there is even case when we should use it yet?

Definitely yes. If we use pandas or any tool for handeling huge dataset, we can easily face the situation, where some big data can consume our entire memory. To avoid this unwanted effect we can use some technics like method chaining:

(
    wine.rename(columns={"color_intensity": "ci"})
    .assign(color_filter=lambda x: np.where((x.hue > 1) & (x.ci > 7), 1, 0))
    .query("alcohol > 14 and color_filter == 1")
    .sort_values("alcohol", ascending=False)
    .reset_index(drop=True)
    .loc[:, ["alcohol", "ci", "hue"]]
)

which make our code more compact (though harder to interpret and debug too) and consumes less memory as the chained methods works with the other method's returned values, thus resulting in only one copy of the input data. We can see clearly, that we will have 2 x original data memory consumption after this operations.

Or we can use inplace parameter (though harder to interpret and debug too) our memory consumption will be 2 x original data, but our memory consumption after this operation remains 1 x original data, which if somebody whenever worked with huge datasets exactly knows can be a big benefit.


Final conclusion:

Avoid using inplace parameter unless you don't work with huge data and be aware of its possible issues in case of still using of it.

Community
  • 1
  • 1
Geeocode
  • 5,705
  • 3
  • 20
  • 34
  • 2
    Can you please clarify why we "will have 2 x original data memory consumption after this operations" when using method chainig? I get why we need x2 on calculation, but can't figure out why we still use x2 after that – Mikhail_Sam Jun 09 '21 at 07:20
  • 1
    "and consumes less memory as the chained methods works with the other method's returned values, thus resulting in only one copy of the input data" assuming GC will clear all preliminary copies instantly? I am not sure. The result of every chained method call is a plain Python object of type DataFrame, there should be a reason to clean it. – QtRoS Oct 09 '22 at 17:50
2

Save it to the same variable

data["column01"].where(data["column01"]< 5, inplace=True)

Save it to a separate variable

data["column02"] = data["column01"].where(data["column1"]< 5)

But, you can always overwrite the variable

data["column01"] = data["column01"].where(data["column1"]< 5)

FYI: In default inplace = False

hyukkyulee
  • 1,024
  • 1
  • 12
  • 17
2

When trying to make changes to a Pandas dataframe using a function, we use 'inplace=True' if we want to commit the changes to the dataframe. Therefore, the first line in the following code changes the name of the first column in 'df' to 'Grades'. We need to call the database if we want to see the resulting database.

df.rename(columns={0: 'Grades'}, inplace=True)
df

We use 'inplace=False' (this is also the default value) when we don't want to commit the changes but just print the resulting database. So, in effect a copy of the original database with the committed changes is printed without altering the original database.

Just to be more clear, the following codes do the same thing:

#Code 1
df.rename(columns={0: 'Grades'}, inplace=True)
#Code 2
df=df.rename(columns={0: 'Grades'}, inplace=False}
Harsha
  • 533
  • 3
  • 13
1

Yes, in Pandas we have many functions has the parameter inplace but by default it is assigned to False.

So, when you do df.dropna(axis='index', how='all', inplace=False) it thinks that you do not want to change the orignial DataFrame, therefore it instead creates a new copy for you with the required changes.

But, when you change the inplace parameter to True

Then it is equivalent to explicitly say that I do not want a new copy of the DataFrame instead do the changes on the given DataFrame

This forces the Python interpreter to not to create a new DataFrame

But you can also avoid using the inplace parameter by reassigning the result to the orignal DataFrame

df = df.dropna(axis='index', how='all')

0

inplace=True is used depending if you want to make changes to the original df or not.

df.drop_duplicates()

will only make a view of dropped values but not make any changes to df

df.drop_duplicates(inplace  = True)

will drop values and make changes to df.

Hope this helps.:)

Shahir Ansari
  • 1,682
  • 15
  • 21
0

inplace=True makes the function impure. It changes the original dataframe and returns None. In that case, You breaks the DSL chain. Because most of dataframe functions return a new dataframe, you can use the DSL conveniently. Like

df.sort_values().rename().to_csv()

Function call with inplace=True returns None and DSL chain is broken. For example

df.sort_values(inplace=True).rename().to_csv()

will throw NoneType object has no attribute 'rename'

Something similar with python’s build-in sort and sorted. lst.sort() returns None and sorted(lst) returns a new list.

Generally, do not use inplace=True unless you have specific reason of doing so. When you have to write reassignment code like df = df.sort_values(), try attaching the function call in the DSL chain, e.g.

df = pd.read_csv().sort_values()...
Louis
  • 31
  • 3
  • providing exact working code with proper formatting will really help users to understand your answer faster. Requesting you to do the same. I am not a panda expert, so cannot reformat you answer, but its highly recommended, – Anand Vaidya Dec 10 '19 at 16:25
0

As Far my experience in pandas I would like to answer.

The 'inplace=True' argument stands for the data frame has to make changes permanent eg.

    df.dropna(axis='index', how='all', inplace=True)

changes the same dataframe (as this pandas find NaN entries in index and drops them). If we try

    df.dropna(axis='index', how='all')

pandas shows the dataframe with changes we make but will not modify the original dataframe 'df'.

Chetan
  • 41
  • 5
0

If you don't use inplace=True or you use inplace=False you basically get back a copy.

So for instance:

testdf.sort_values(inplace=True, by='volume', ascending=False)

will alter the structure with the data sorted in descending order.

then:

testdf2 = testdf.sort_values( by='volume', ascending=True)

will make testdf2 a copy. the values will all be the same but the sort will be reversed and you will have an independent object.

then given another column, say LongMA and you do:

testdf2.LongMA = testdf2.LongMA -1

the LongMA column in testdf will have the original values and testdf2 will have the decrimented values.

It is important to keep track of the difference as the chain of calculations grows and the copies of dataframes have their own lifecycle.

Ryan Hunt
  • 51
  • 7