0

I am using the New York Times' managed COVID19 county by county data to play with while I learn Python/Pandas. The dataset uses a running total, and I'm trying to create an additional column for "new cases". Here is what I have tried:

import pandas as pd
import requests 

df = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19- 
data/master/us-counties.csv")
df1 = df.loc[(df['state'] == 'New Hampshire') & (df['county'] == 
'Rockingham')]

This is an example output from df1:

date county state flips cases deaths
2021-10-16 Rockingham New Hampshire 33015.0 30304 297.0

Now what I am trying to do is create the 'new cases' column for df1. Here is what I have tried:

df1['new cases'] = df1['cases'].diff()

This returns the error:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc (can't post the rest here because stackoverflow thinks it's code and won't submit otherwise)...

Ideally I'd like to keep a running total of all counties separate, but I like to try to break down large problems into smaller ones while I'm still learning, and I can't seem to figure out why something like this isn't working.

Acuity
  • 35
  • 4
  • 1
    `df1` is a slice of `df`. Do `df1 = df1.copy()` before you modify it. That said, you might be looking for `df['new_cases'] = df.groupby(['state','country'])['cases'].diff()`. – Quang Hoang Oct 21 '21 at 15:07
  • Does this answer your question? [How to deal with SettingWithCopyWarning in Pandas](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – 0x5453 Oct 21 '21 at 15:08
  • Thank you. This works perfectly. I was under the impression that my original command would've made df1 an independent dataframe, rather than a slide of the dataframe. It sounds like what you are saying is that is that the correct way to do this is to use the copy command before working with queried data. Thank you for the explanation. – Acuity Oct 21 '21 at 17:14

0 Answers0