1

I'm looking at the John Hopkins Dataset.

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

I have transformed it to the form (dummy data).

Country/Region  | Province/State |   Date     |   Type   | Cases | 
     US               Arizona      2020/03/14   Confirmed  100 
Country/Region  | Province/State |   Date     |   Type   | Cases | 
     US               Arizona      2020/03/15   Confirmed  120 

What I want is to calculate the difference between date n and date n-1 for each country,region and case type.

Something like

df['Difference'] = df.groupby(['Country/Region','Province/State','Type']).apply(...)

But I am not sure how to write the apply function.

I want the output table to look like this.

Country/Region  | Province/State |   Date     |   Type   | Cases |  Difference
     US               Arizona      2020/03/14   Confirmed  100         ...
Country/Region  | Province/State |   Date     |   Type   | Cases | 
     US               Arizona      2020/03/15   Confirmed  120         20

How is this achieved?

kspr
  • 980
  • 9
  • 23

0 Answers0