1

This code produces the SettingWithCopyWarning as follows:

raw_corpus.loc[:,'constructed_recipe']=raw_corpus.loc[:,'trigger_channel_cat'] + " " + raw_corpus.loc[:,'trigger_channel_clean'] + " " + raw_corpus.loc[:,'trigger_name_clean'] + " " + raw_corpus.loc[:,'action_name_clean'] +" " + raw_corpus.loc[:,'action_channel_clean'] +" " + raw_corpus.loc[:,'action_channel_cat']

/Users/dlhoffman/anaconda3/envs/gensim-py35/lib/python3.5/site-packages/pandas/core/indexing.py:537: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s

This code produces a different warning:

raw_corpus['constructed_recipe']=raw_corpus['trigger_channel_cat'] + " " + raw_corpus['trigger_channel_clean'] + " " + raw_corpus['trigger_name_clean'] + " " + raw_corpus['action_name_clean'] +" " + raw_corpus['action_channel_clean'] +" " + raw_corpus['action_channel_cat']

/Users/dlhoffman/anaconda3/envs/gensim-py35/lib/python3.5/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""

Both pieces of code do what I want, but the error is annoying and it's my understanding that this is not a good error. I've read up on the documentation and also people's suggestions here, but can't figure out what I am doing wrong.

cs95
  • 379,657
  • 97
  • 704
  • 746
profhoff
  • 1,017
  • 1
  • 13
  • 21
  • FYI https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – BENY Mar 16 '18 at 20:45

1 Answers1

2

This is a well documented error. Take a look at How to deal with SettingWithCopyWarning in Pandas?.

For the fix, start with

raw_corpus = raw_corpus.copy(deep=True)

Next, get a list of all the columns you want to aggregate:

cols = ['trigger_channel_cat', 'trigger_channel_clean', ...]

And call df.agg:

raw_corpus['constructed_recipe'] = raw_corpus[cols].agg(' '.join, axis=1)
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    Nice man :-) I did not expected agg can be used here – BENY Mar 16 '18 at 20:48
  • 1
    Excellent and learned something new (.copy, .agg) – profhoff Mar 16 '18 at 20:58
  • @Wen Yeah, if not for that, I would've closed as a dupe. But I believed the question deserved an answer. ;) – cs95 Mar 16 '18 at 21:02
  • Shallow vs deep copies! But what is difference between df.copy.deepcopy and df.copy(deep=True)? – profhoff Mar 16 '18 at 21:04
  • @profhoff df.copy I think their difference lies in how mutable object column are treated, such as a column of lists/dicts (copy returns references in that case regardless). – cs95 Mar 16 '18 at 21:05