2

I have a (possibly large) DataFrame and I want to set values based on multiple criteria.

I would like to do this by first applying the "big cut" once, and then changing values on parts of that slice. But I want these changes to apply to the original DataFrame, so I don't want a copy when I do the first selection.

Here is a simple example.

import pandas

df = pandas.DataFrame( [(1,2,3),
                        (4,5,6),
                        (7,8,9)],
                       columns=["x","y","z"]
                       )

# the "big cut" -- does this return a copy? why?
vw = df.loc[ df["x"]<5, : ]

# the "sub" selections set values only in vw, not df
vw.loc[ vw["z"]>5, "z" ] = 666
vw.loc[ vw["y"]<5, "y" ] = 222

print "vw:"
print vw
print "df:"
print df

which prints the result:

vw:
   x    y    z
0  1  222    3
1  4    5  666

df:
   x  y  z
0  1  2  3
1  4  5  6
2  7  8  9

My desired result:

df_ideal:
   x    y    z
0  1  222    3
1  4    5  666
2  7    8    9

How can I do my sub-selections on a view rather than a copy? I have read this question, but it seems like I should be getting a view by those rules. (I am using Pandas 0.19.2 and Python 2.7.10)

Corey
  • 1,845
  • 1
  • 12
  • 23
  • It returns a copy (execute `df.loc[df["x"]<5, :]._is_view` or `bool(df.loc[df["x"]<5, :].is_copy)`. As far as I know, there is no way to operate on a view with boolean indexing. You'll have to do `cond1 = df["x"]<5` and then `df.loc[cond1 & (df['z'] > 5)] = 666`. – ayhan Jul 20 '17 at 18:42
  • So .. could I use the `Series` of booleans to get a `Series` of index values (of `df`) and then use index slicing with that to keep only views? But then I'm also confused -- if boolean indexing returns a copy, why do the values in `vw` get set correctly? – Corey Jul 20 '17 at 18:52
  • It surely modifies the DataFrame it was called on if you use it for setting (i.e. you can be sure that `df.loc[something] = 5` will modify the original DataFrame. However, `another_df = df.loc[something]` may or may not return a view (a simple column selection returns a view for example). The changes you do in `another_df` then, may or may not be reflected in the original df. – ayhan Jul 20 '17 at 19:02
  • 1
    See [@unutbu's explanation](https://stackoverflow.com/questions/27367442/pandas-dataframe-view-vs-copy-how-do-i-tell#comment43188590_27367693) on why the first one doesn't work but the second one works. – ayhan Jul 20 '17 at 19:04
  • How about index on the subset, but still set on the original `df.ix[vw[vw["z"]>5].index, "z"] = 666`. I think `ix` operation is fast. – Huang Jul 20 '17 at 19:09

0 Answers0