2

I'm working with some data where I'm trying to convert an entire column to a different format (ie from object to datetime or from object to numeric) using methods not resetting values. Each line of code below returns the 'SettingwithCopyWarning' error:

#converting euro values column 'value' to numeric values:

df['value'] = pd.to_numeric(df.value, errors='coerce')

#converting object to datetime in order to extract year:

df['date'] = pd.to_datetime(df['date'])

df['date'] = df['date'].dt.year

If I leave any of the above lines in, it causes an error. If I take all of them out, the code doesn't raise any warnings.

After some research, I learned the 'SettingwithCopyWarning' crops up when chained assignments are used, and the view is a copy of the dataframe as opposed to the dataframe itself, (ref: https://www.dataquest.io/blog/settingwithcopywarning/).

I also learned that the general form to avoid chained assignments is df.loc[<mask or index label values>, <optional column>] = < new scalar value or array like> (ref:python pandas: how to avoid chained assignment).

I tried to wrangle something together like this just to test out the form:

df.loc[df['value']] = pd.to_numeric(df.value, errors='coerce')

but it returns an error like:

KeyError: "['$3.40m' '$3.90m' '$12.60m' '$13.80m' '$123.80m' '$171.20m'\n '$205.2m' '$214.40m' '$221.03m'] not in index"

which is making me think the general form I tried to stuff it in is confusing it for a dictionary and raising a KeyError.

After looking around, I'm not sure how to apply this to entire columns (like my code) that are using methods (dot functions) without using chained assignments.

Is there a way around this?

Edit:

Lines above the given code:

parent_df = pd.DataFrame.from_records(data, columns = ['date', value'])

df = parent_df[parent_df.date.str.contains('.*201[4-9]')]
exlo
  • 315
  • 1
  • 8
  • 20
  • 3
    The root of the warning isn't those lines, but some line above that where you created some slice and it didn't return a new object. Did you do something like `df = df.drop_duplicates()` or `df = df[some_boolean_mask]`. The TLDR is just do `df = df.copy()` above those lines (or after the slice that doesn't return a new object): https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – ALollz Aug 28 '19 at 21:23
  • Just added this edit: "If I leave any of the above lines in, it causes an error. If I take all of them out, the code doesn't raise any warnings." I also don't think I have anything above it that should cause a problem, the first line is creating a df from the output and the second is creating another df as a subset from the first. Would these raise the warning? I added these lines in an edit at the end. – exlo Aug 28 '19 at 21:38
  • 2
    Yes, the line that is creating the subset is returning a view of the original DataFrame, not a new object. In later lines when you then try to define new columns, you're adding new columns to this view, so you get the warning there. Tag on a `.copy()` to the end of the operation that creates the subset. This forces pandas create a new object, and the warnings will disappear. – ALollz Aug 28 '19 at 21:43
  • I see, thank you for pointing that out. For future understanding, the line that was originally meant to make a copy doesn't have any `chaining` and I've seen this structure in tutorials for panda before. What is it about the the syntax that is returning a view, not a copy? – exlo Aug 28 '19 at 22:52

0 Answers0