2

I've been searching around reading the pandas docs here and trying different lines of code from questions posted around here and here and I can't seem to get away from the setting with copy warning. I'd prefer to learn to code it the "right" way as opposed to just ignoring the warnings.

The following lines of code are inside a for loop and I don't want to generate this warning a lot of times because it could slow things down.

I'm trying to make a new column with name: 'E'+vs where vs is a string in a list in the for loop

But for each one of them, I still get the following warning, even with the last 3 lines:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Here are the troublesome lines I've tried so far:

#based on research, the first two seem to be the "wrong" way

df_out['E'+vs] = df_out[kvs].rolling(v).mean().copy()
df_out['E'+vs] = df_out[kvs].rolling(v).mean()

df_out.loc[:,'E'+vs] = df_out[kvs].rolling(v).mean().copy()
df_out.loc[:,'E'+vs] = df_out[kvs].rolling(v).mean()
df_out.loc[:,'E'+vs] = df_out.loc[:,kvs].rolling(v).mean()

The other one that gives a SettingWithCopyWarning is this:

df_out.dropna(inplace=True,axis=0)

This one also gave a warning (but I figured this one would)

df_out = df_out.dropna(inplace=True,axis=0)

How do I do both of these operations correctly?

EDIT: Here is the code that produced the original df_out

df_out= pd.concat([vol.Date[1:-1], ret.Return_Time[:-2], vol.Freq_Time[:-2],
               vol.Freq_Time[:-1].shift(-1), vol.Freq_Time[:].shift(-2)],
               axis=1).dropna().set_index('Date')
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
Monty
  • 781
  • 2
  • 6
  • 23

2 Answers2

6

This is a confusing topic. It's not the code you've posted that is the problem. It's the code you haven't posted. It's the code that generated the df_out

Consider this example and note the last line that generates the warning.

df_other = pd.DataFrame(dict(A=[1], B=[2]))
df_out = df_other[:]

df_out['E'] = 5
//anaconda/envs/3.5/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Now we'll try an equivalent thing that won't produce the warning

df_other = pd.DataFrame(dict(A=[1], B=[2]))
df_out = df_other.loc[:]

df_out['E'] = 5

Then

print `df_out`

   A  B  E
0  1  2  5

It boils down to pandas deciding to attach an is_copy attribute to a dataframe when it's constructed based on lots of criteria.

Notice the

df_other[:].is_copy

<weakref at 0x103323458; to 'DataFrame' at 0x116a684e0>

When

df_other.loc[:].is_copy

Returns None


So what types of construction trigger the copy? I still don't know everything, and not even the things I know all make sense to me.

Like why does this not trigger it?

df_other[['A', 'B', 'E']].is_copy
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Interesting answer. Definitely not clear cut as to what pandas is really doing. My use case is as follows. I have an original df that gets passed into a function containing those two operations. So say, the initial len(df_out) = 1000. After the operations len(df_out) =998. Then I do other stuff. But the next time I iterate, I still want len(df_out) =1000, i.e the original dataframe. So to solve this, before going into the function i did df_out = df[:] to make a copy of the original dataframe. That I now pass into the function. All works right but this stupid warning is annoying. – Monty Apr 02 '17 at 07:37
  • I checked id(df) does not equal id(df_out). Kind of annoying that the value is passed by reference gets changed. So iteration 1, the len goes from 1000 to 998 because of the dropped rows, then iteration 2 the len starts from 998 to 996 and so forth, which is wrong as it should start from 1000 again. – Monty Apr 02 '17 at 07:42
  • Still havent figured it out. I'll sleep on it. – Monty Apr 02 '17 at 11:33
  • it was as simple as doing df.copy() ....facepalm lol – Monty Apr 05 '17 at 06:42
2

First off, I am not sure this is either efficient or the best approach. However, I had the same issue when I was adding a new column to the exist dataframe and I decided to use reset_index method.

Here I first drop Nan rows from EMPLOYEES column and assign this manipulated data frame to new data frame df1 then I add COMPANY_SIZE column to the df1 as follows:

df1 = all_merged_years.dropna(subset=['EMPLOYEES']).reset_index()

column = df1['EMPLOYEES']

Size =[]

df1['COMPANY_SIZE'] = ' '

for number in column:
    if number <=999:
        Size.append('Small')
    elif 999<number<=9999:
        Size.append('Medium')
    elif 9999<number:
        Size.append('Large')
    else:
        Size.append('UNKNOWN')

df1['COMPANY_SIZE'] = Size

This way I did NOT get a warning as such. Hope that helps.

Ozkan Serttas
  • 947
  • 13
  • 14