0

In order to decide as to whether some operations return a copy or a view, I consulted the posts to the question: What rules does Pandas use to generate a view vs a copy?

It reads as follows:

Here's the rules, subsequent override:

All operations generate a copy

If inplace=True is provided, it will modify in-place; only some operations support this

An indexer that sets, e.g. .loc/.iloc/.iat/.at will set inplace.

An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that's why this is not reliable). This is mainly for efficiency. (the example from above is for .query; this will always return a copy as its evaluated by numexpr)

An indexer that gets on a multiple-dtyped object is always a copy.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
df2 = df.loc[0,]
df2[0] = 500

So this last line does not change df. Reasoning: The call "df.loc[]" is an indexer that gets on a multi-dtyped object. Therefore df2 is a copy. Now, I thought I got the rules - however when using the following:

df3 = df.iloc[0:2,]
df3[:] = 55

I get a) a warning saying "A value is trying to be set on a copy of a slice from a DataFrame." and b) the original df is changed - but only partially. The first two columns are unchanged, whereas the last two are changed to 55.

I don't understand this behavior wrt the rules outlined above. For instance, "df.iloc[0:2,]" is an indexer that gets on a multi-dtyped object and should therefore return a copy. So why do I get the warning that a value is set on a copy of a slice?

P.Jo
  • 532
  • 3
  • 9
  • This is a frequent request. You should never rely on view of DataFrames, this is quite risky because pandas might have tricky rules to generate or not a copy. Why not saving the indices/mask instead? – mozway Aug 31 '23 at 09:21
  • What do you mean by saving the indices/mask? – P.Jo Aug 31 '23 at 09:24
  • for instance `idx = slice(0, 2)`, `df[idx] = 55` – mozway Aug 31 '23 at 09:32
  • Well this is calling the normal __setitem__ on a df. It's ok - but if this is best practice - why introduce assignment operations on loc and iloc at all? – P.Jo Aug 31 '23 at 09:39

0 Answers0