I am messing around in the NYT covid dataset which has total covid cases for each county, per day.
I would like to find out the difference of cases between each day, so theoretically I could get the number of new cases per day instead of total cases. Taking a rolling mean, or resampling every 2 days using a mean/sum/etc all work just fine. It's just subtracting that is giving me such a headache.
Tried methods:
df.resample('2d').diff()
-
'DatetimeIndexResampler' object has no attribute 'diff'
-
df.resample('1d').agg(np.subtract)
-
ufunc() missing 1 of 2required positional argument(s)
-
df.rolling(2).diff()
-
'Rolling' object has no attribute 'diff'
-
df.rolling('2').agg(np.subtract)
-
ufunc() missing 1 of 2required positional argument(s)
-
Sample data:
pd.DataFrame(data={'state':['Alabama','Alabama','Alabama','Alabama','Alabama'],
'date':[dt.date(2020,3,13),dt.date(2020,3,14),dt.date(2020,3,15),dt.date(2020,3,16),dt.date(2020,3,17)],
'covid_cases':[1.2,2.0,2.9,3.6,3.9]
})
Desired sample output:
pd.DataFrame(data={'state':['Alabama','Alabama','Alabama','Alabama','Alabama'],
'date':[dt.date(2020,3,13),dt.date(2020,3,14),dt.date(2020,3,15),dt.date(2020,3,16),dt.date(2020,3,17)],
'new_covid_cases':[np.nan,0.8,0.9,0.7,0.3]
})
Recreate sample data from original NYT dataset:
df = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv',parse_dates=['date'])
df.groupby(['state','date'])[['cases']].mean().reset_index()
Any help would be greatly appreciated! Would like to learn how to do this manually/via function rather than finding a "new cases" dataset as I will be working with timeseries a lot in the very near future.