1

My dataframe looks like this:

df.head(6)
   City        Date      Total_Cases   Country      State
0  Autauga  2020-01-29     0             US        Alabama
1  Autauga  2020-01-30     3             US        Alabama
2  Autauga  2020-01-31     5             US        Alabama
3  Weston   2020-01-01     0             US        Wyoming
4  Weston   2020-01-02     2             US        Wyoming
5  Weston   2020-01-03     2             US        Wyoming

I would like to insert a new column called New_Cases which is today's Total_Cases minus Total_Cases from yesterday, by City/State.

   City        Date      Total_Cases  New_Cases  Country      State
0  Autauga  2020-01-29     0             0        US        Alabama
1  Autauga  2020-01-30     3             3        US        Alabama
2  Autauga  2020-01-31     5             2        US        Alabama
3  Weston   2020-01-01     0             0        US        Wyoming
4  Weston   2020-01-02     2             2        US        Wyoming
5  Weston   2020-01-03     2             0        US        Wyoming
user3813620
  • 352
  • 2
  • 8

1 Answers1

2

You could try using shift here:

df['New_Cases'] = df.Total_Cases - df.groupby(['City', 'State'])['Total_Cases'].shift(1).fillna(0)
user3813620
  • 352
  • 2
  • 8
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • cool........... – ksha Dec 13 '20 at 04:58
  • 1
    I think it should be more like ``df.Total_Cases - df.groupby(["City", "State"])["Total_Cases"].shift(1).fillna(0)`` assuming the date is sorted as well – sammywemmy Dec 13 '20 at 05:00
  • Tim Biegeleisen your code was missing the part before the minus sign which @sammywemmy added. Thank you guys so much for your help. Also another community member has referenced a similar solution in another post using diff() function – user3813620 Dec 13 '20 at 07:28