1

so I'm trying to create a new column that indicates whether or not the specified condition is True. I want the column to simply state "1" or "0".

Here's my code:

data_sub = data_orig.loc[~pd.isnull(data_orig['Last_Audit_Date']), :]
data_sub.reset_index(inplace=True)
data_sub['PackageLengthFlag'] = (abs(data_sub.loc['AUDIT_Primary_Length'] - data_sub.loc[:, 'PKG_SUB_Length']) > threshold)

I am thinking that True = 1 and False = 0 by default, if I convert it into integers, right? (thought I read somewhere saying this...)

And here's the warning that I keep getting:

SettingWithCopyWarning:  A value is trying to be set on a copy of 
a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

I read into:

How to deal with SettingWithCopyWarning in Pandas?

Correct way to set value on a slice in pandas

Pandas SettingWithCopyWarning

But I don't think they do what I am looking for. Anyone has any advice? I know this question may sound painfully stupid, but still appreciate any help!

Edit I've added the 2 lines of code where I created the data_sub. Hope that helps!

cs95
  • 379,657
  • 97
  • 704
  • 746
alwaysaskingquestions
  • 1,595
  • 5
  • 22
  • 49
  • The source of this error isn't here, but probably in the code preceeding this. You probably tried to extract some sub-slice of your dataframe without calling `.copy` which generates this error. – cs95 Oct 25 '17 at 00:44
  • You are getting that warning most likely because `data_sub` is a copy of a slice – juanpa.arrivillaga Oct 25 '17 at 00:46
  • Its hard to guess without sample df but this should work. data_sub['PackageLengthFlag'] = (np.abs(data_sub['AUDIT_Primary_Length'] - data_sub['PKG_SUB_Length']) > threshold).astype(int) – Vaishali Oct 25 '17 at 00:46
  • I've added the 2 lines of code where I created the data_sub. Hope that helps! – alwaysaskingquestions Oct 25 '17 at 00:51

1 Answers1

4

The error is in the code above this, when you try extracting some dataframe subslice without making a complete copy, so the reference you have is actually a reference to part of another larger dataframe.

Without much context on what you're trying to do, just make a copy beforehand:

data_sub = data_sub.copy()
data_sub['PackageLengthFlag'] = (
    data_sub['AUDIT_Primary_Length']
        .sub(data_sub['PKG_SUB_Length'])
        .abs()
        .gt(threshold)
        .astype(int)

Call df.abs to perform the abs function on the entire result. abs is a vanilla python method that cannot handle pd.Series objects.

One final astype call converts the result to integral values.


Here's an example of what you're doing:

df

  A_Key B_ID C_Key  D_NA
0   123   22   343    23
1   121   23  45.4    52

x = df.iloc[[0], :]
x

  A_Key B_ID C_Key  D_NA
0   123   22   343    23

x.iloc[:, 0] += 2
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/indexing.py:517: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

You see the error generated. But in most cases, the original should still be modified without affecting the original. Now, copy first:

x = x.copy()
x.iloc[:, 0] += 2  # no warning

And you see the error suppressed. Interestingly, the same behaviour is not seen when performing similar operations on vertical subslices. I believe pandas smartly handles this by returning a full independent copy.

cs95
  • 379,657
  • 97
  • 704
  • 746