a SettingWithCopyWarning
happens when you have made a copy of a slice of a DataFrame, but pandas thinks you might be trying to modify the underlying object.
To fix it, you need to understand the difference between a copy and a view. A copy makes an entirely new object. When you index into a DataFrame, like:
data['query'].str.split().apply(len)
or
data['tokens']
you're creating a new DataFrame that is a modified copy of the original one. If you modify this new copy, it won't change the original data
object. You can check that with the _is_view
attribute, which will return a boolean value.
data['tokens']._is_view
On the other hand, when you use the .at
, .loc
, or .iloc
methods, you are taking a view of the original DataFrame. That means you're subsetting it according to some criteria and manipulating the original object itself.
Pandas raises the SettingWithCopyWarning
when you are modifying a copy when you probably mean to be modifying the original. To avoid this, you can explicitly use .copy()
on the data that you are copying, or you can use .loc
to specify the columns you want to modify in data
(or both).
Since it depends a lot on what transformations you've done to your DataFrame already and how it is set up, it's hard to say exactly where and how you can fix it without seeing more of your code. There's unfortunately no one-size-fits-all answer. If you can post more of your code, I'm happy to help you debug it.
One thing you might try is creating an intermediate lengths
object explicitly, in case that is the problem. So your code would look like:
lengths = data['query'].str.split().apply(len).copy()
data['tokens'] = lengths