1

why does:

dfTest_4 = dfMasterExChk[
    (dfMasterExChk.quoll_status.isin(['In Service','In Service - Not 
Accepted'])) &
    (dfMasterExChk.emn_active.isin(['Yes'])) &        
    (dfMasterExChk.atoll_Tx_status.isin(['']))
    ]
dfTest_4['errMsg'] = 'Not in Atoll'

work but give me a warning msg that makes no sense? Im not copying on slices of anything! also If I used .loc I'd have to use a for loop or a apply or one of those horrible pesky magical lambdas.

Sorry, this appears to be nonsense in this context

This is the error I get:

"SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy # -- coding: utf-8 --"

Mick Hawkes
  • 47
  • 2
  • 9
  • 1
    What is code above `df[new_col] = "error msg"` ? If filtering, need `.copy()` like `df = df[df['col'] == 0].copy()` – jezrael Jul 31 '18 at 06:15
  • sorry jezrael, i don't understand your response – Mick Hawkes Jul 31 '18 at 06:17
  • 1
    Thsi error is very confused, problem is in line above, so ask for code above `df[new_col] = "error msg"` (2, 3 rows is nice) – jezrael Jul 31 '18 at 06:18
  • I want to add a standard error message to an existing dataframe. I later combine these data frames to make an error file. simple really – Mick Hawkes Jul 31 '18 at 06:18
  • is that better? – Mick Hawkes Jul 31 '18 at 06:19
  • @MickHawkes: please post the full code in your question, not as a comment. As stands this question will mislead other people because it doesn't show the line above. – smci Jul 31 '18 at 06:21
  • You have sliced another DataFrame and assigned the result to `df` (before doing `df[new_col] = something`). And this warning is telling you that your original DataFrame will not change. – ayhan Jul 31 '18 at 06:23
  • ok is that better? so is that slicing? – Mick Hawkes Jul 31 '18 at 06:27
  • 1
    If you modify values in `dfTest_4` later by adding new column you will find that the modifications do not propagate back to the original data (`dfMasterExChk`), and that Pandas does warning. – jezrael Jul 31 '18 at 06:28
  • 1
    Yes. Just add `.copy()` at the end of that line (`dfTest_4 = dfMasterExChk[conditions].copy()`) to indicate you know the resulting DataFrame is a copy and you will no longer see the warning. – ayhan Jul 31 '18 at 06:28
  • ok I just tried adding .copy() as you suggested but i still get the warning – Mick Hawkes Jul 31 '18 at 06:34
  • Oh and who ever put the 'this has been answered before' message up there should go read it. It is specifically about ix and also has deprecated functions in it such as 'is_copy' – Mick Hawkes Jul 31 '18 at 06:40

1 Answers1

3

You can use df.loc[:,'new_col']="error_msg" This is because by giving the former way the system cannot know if you are assigning the values to a copy or a reference. Thus the warning triggers to check the output. You can read more about it in this blog. dataquest.io/blog/settingwithcopywarning

Using the .loc method, we are assigning values to the dataframe not to a copy, thus the error does not occur.

.loc is faster, because it does not try to create a copy of the data. .loc is meant to modify your existing dataframe inplace, which is more memory efficient. .loc is predictable, it has one behavior

Sreekiran A R
  • 3,123
  • 2
  • 20
  • 41
  • thanks, but why is that preferable/better? – Mick Hawkes Jul 31 '18 at 06:28
  • 1
    This is because by giving the former way the system cannot know if you are assigning the values to a copy or a reference. Thus the warning comes to check the output. You can read more about it in this blog. https://www.dataquest.io/blog/settingwithcopywarning/ – Sreekiran A R Jul 31 '18 at 06:37
  • @Sreekiran explaining why this is better in your actual question will make it more valuable :) – Shadow Jul 31 '18 at 06:49
  • I just tried this method and got the same warning message too? – Mick Hawkes Jul 31 '18 at 06:53
  • .loc is faster, because it does not try to create a copy of the data. .loc is meant to modify your existing dataframe inplace, which is more memory efficient. .loc is predictable, it has one behavior @Shadow – Sreekiran A R Jul 31 '18 at 06:57
  • you can disable the warning by setting ` pd.options.mode.chained_assignment = None # default='warn' ` – Sreekiran A R Jul 31 '18 at 06:59
  • 1
    @Sreekiran the fact that other people have already asked for clarification as to why it's an improvement should be proof enough that this answer requires elaboration. Comments disappear from this site - and some of the ones you've added seem to contain useful information. Please consider adding the useful bits to your answer :) – Shadow Jul 31 '18 at 07:00
  • OK I just restarted Spyder and now it works. Go figure? Thanks for the help everyone. – Mick Hawkes Jul 31 '18 at 07:02
  • @Shadow I removed only some of my comments as they were out of context with answers that appeared while I was writing it. They only confused things. I would never remove comments by others – Mick Hawkes Jul 31 '18 at 07:04
  • @Shadow Thanks for your support. – Sreekiran A R Jul 31 '18 at 07:05
  • @MickHawkes Glad we could help! :D happy coding – Sreekiran A R Jul 31 '18 at 07:06
  • thanks for the answer @Sreekiran I understand now. And I agree with shadow, you should put these elaboration comments in the Answer – Mick Hawkes Jul 31 '18 at 07:07
  • I have added. See the edited answer – Sreekiran A R Jul 31 '18 at 07:10