0

In the following piece of code, I group the points of a DataFrame by their X value into bins. Now I want to assign a group ID to the Y column, but pandas keep throwing me a warning of type SettingWithCopyWarning. What am I doing wrong?

import numpy as np
import pandas as pd
d = np.random.random((10, 2))
d[:, 1] = 0
m = pd.DataFrame(d, columns=("x", "gid"))
dx = 0.2
grp = m.groupby(lambda i: int(m["x"][i] / dx))
gid = 1
for name, group in grp:
    group["gid"][:] = gid # This line crashes!
    gid += 1
print(m)

Here is the warning thrown:

/usr/lib/python3.4/site-packages/pandas/core/series.py:677: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._set_with(key, value)
sys:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  • Have a look at http://stackoverflow.com/a/16949498/1571826 for a more elegant and flexible approach to binning. – Def_Os May 19 '15 at 17:57
  • What version pandas are you using? I'm also running python 3.4 and I didn't get any warning when I ran your code exactly as is. – KCzar May 19 '15 at 17:58
  • Works fine in python 3.4.3, pandas 0.16.1 and numpy 1.9.2 – EdChum May 19 '15 at 18:05

1 Answers1

1

There are two issues here. First, you are getting a SettingWithCopyWarning because

group["gid"][:] = gid

uses "chained-indexing". The problem is that sometimes group[...] may return a copy instead of a view of group, and so further indexing and modification of the copy, e.g. group[...][...] = gid may be useless since it only modifies the copy and not group. SettingWithCopyWarning is a warning that chained-indexing has been detected in an assignment. It does not necessarily mean anything has gone wrong. In your case group["gid"] returns a view of group, so your chained indexing happens to succeed in modifying group itself.

Nevertheless, the recommended practice is to always avoid chained-indexing when performing assignments since it is not always easy to predict if the chained-indexing will return a view or a copy.

Usually you can avoid chained-indexing by using .loc or iloc:

group.loc[:, "gid"] = gid 

The second issue is that even if we avoid chained indexing, modifying group does not modify m.

When you use a for-loop:

for name, group in grp:

Python creates local variables name and group and bind these variables to items in grp. But these items are themselves copies, not views, of portions of m. So modifying these copies do not affect m.


Instead of using groupby, you could use pd.Categorical:

import numpy as np
import pandas as pd
np.random.seed(2015)
d = np.random.random((10, 2))
d[:, 1] = 0
m = pd.DataFrame(d, columns=("x", "gid"))
dx = 0.2
m['gid'] = pd.Categorical((m['x'] / dx).astype(int)).codes + 1

print(m)

yields

          x  gid
0  0.737595    3
1  0.884189    4
2  0.944676    4
3  0.063603    1
4  0.332454    2
5  0.003218    1
6  0.071058    1
7  0.289020    2
8  0.268896    2
9  0.258775    2
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677