1

I'm not even going to list the number of duplicate questions on SO about SettingWithCopyWarning. I'm simply attempting to convert a string field to a datetime field for an entire dataframe. Following all the advice I've found using df.loc[] I'm still getting that error.

Here's the code.

import pandas as pd
df = pd.read_csv("data/cat-1.csv")

# kill empty rows
trimmed_df = df.dropna(how="all")

# proper DateTime
trimmed_df.loc[:,'Transaction Date'] = pd.to_datetime(trimmed_df['Transaction Date'])

    

Followed by the ubiquitous error:

/usr/local/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py:1745: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(ilocs[0], value)

The crazy part is that I'm doing exactly what the error is telling me to do, and yet I'm STILL getting the error.

The other crazy part is if you search SO for "pandas convert column to datetime" you get a bunch of answers like in this question, which all completely ignore .loc and just assign to the column and presumably don't throw errors (there's never a mention of this error in the comments)

There are also plenty of solutions using lambda functions but before I go down that rathole I want to understand why my code throws the warning. It looks like I can avoid the warning by using .copy() when I create trimmed_df, or skip assigning the dropna() to a new variable, but I kind of like the self-documenting "functional" feel of creating these variable aliases to show how my dataframe is being transformed. I'm not sure if this is idiomatic Python or not.

As I rubberduck this, I suspect it's going to be a case of "suck it up" in the comments (go ahead, I won't mind the trolling) but I've got to believe that there's a nice way of disambiguating what my intention is here so that Python doesn't feel the need to save me from myself with this warning, without resorting to unnecessary transformation functions etc.

Am I missing something?

Tom Auger
  • 19,421
  • 22
  • 81
  • 104
  • 1
    chaining `.copy()` to `.dropna()` is the way to go. It guarantees you have `trimmed_df` in its own memory so you can modify it at will. That said, your code should work fine as `dropna()` supposes to return a new copy, not a slice of `df.dropna()`. – Quang Hoang Feb 18 '21 at 01:45
  • Leaving this as a comment, but I answered a similar question on [pandas github](https://github.com/pandas-dev/pandas/issues/17476) for a very similar question a few years ago. The snippet you've posted here should not result in a `SettingWithCopy` warning because `dropna()` should copy the data internally and return you the copy (which is the default pandas behavior for 99% of their methods). – Cameron Riddell Feb 18 '21 at 01:46
  • Thanks to both commenters above, and yet as you can see, the `dropna()` is not, in fact, creating a copy since I most definitely AM getting the error (otherwise I wouldn't have wasted the Internet's time by creating the above question!) – Tom Auger Feb 20 '21 at 17:50

2 Answers2

1

not the most elegant workaround but considering where your heads at right now I'll suggest using a copy of the dataframe instead. append .copy() to the end of the dataframe when you get the error and it should not cause the error because you're working with the copy and not the actual dataframe, and if you need to you can set the original to the copy later. worked for me and curious if it will work for you.

Bibambop64
  • 49
  • 2
  • Thanks! And I appreciate the "where your head's at right now" comment because I recognize that I'm just going through "growing pains" coming from other languages. If you have time to augment your answer to include a "when your head is in a different place, consider the following approach:" I'd appreciate the extra insight, which will help me evolve my thinking. – Tom Auger Feb 20 '21 at 17:48
1

The is generated earlier, with:

trimmed_df = df.dropna(how="all")

Here is where you are generating a second variable, trimmed_df pointing to same object in python's memory. Therefore, trimmed_df and df are essentially pointing to the same object. This can be checked using id(df) and I suggest you always perform this evaluation when in doubt:

id(trimmed_df) == id(df)

Of course since this operation is simply a definition one, you will not have any warnings, instead, they will show up when you do anything with the copy (trimmed_df).

One solution that comes to mind would be to create the new df as a copy, which I believe you have already succesfully tried:

 trimmed_df = df.dropna(how"all").copy(deep=False)

Although not the same, this is relatable to: Python: When do two variables point at the same object in memory?

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53