I am using the New York Times' managed COVID19 county by county data to play with while I learn Python/Pandas. The dataset uses a running total, and I'm trying to create an additional column for "new cases". Here is what I have tried:
import pandas as pd
import requests
df = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-
data/master/us-counties.csv")
df1 = df.loc[(df['state'] == 'New Hampshire') & (df['county'] ==
'Rockingham')]
This is an example output from df1:
date | county | state | flips | cases | deaths |
---|---|---|---|---|---|
2021-10-16 | Rockingham | New Hampshire | 33015.0 | 30304 | 297.0 |
Now what I am trying to do is create the 'new cases' column for df1. Here is what I have tried:
df1['new cases'] = df1['cases'].diff()
This returns the error:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc (can't post the rest here because stackoverflow thinks it's code and won't submit otherwise)...
Ideally I'd like to keep a running total of all counties separate, but I like to try to break down large problems into smaller ones while I'm still learning, and I can't seem to figure out why something like this isn't working.