How can I add a calculated column to a Pandas dataframe?

Question

My dataframe looks like this:

df.head(6)
   City        Date      Total_Cases   Country      State
0  Autauga  2020-01-29     0             US        Alabama
1  Autauga  2020-01-30     3             US        Alabama
2  Autauga  2020-01-31     5             US        Alabama
3  Weston   2020-01-01     0             US        Wyoming
4  Weston   2020-01-02     2             US        Wyoming
5  Weston   2020-01-03     2             US        Wyoming

I would like to insert a new column called New_Cases which is today's Total_Cases minus Total_Cases from yesterday, by City/State.

   City        Date      Total_Cases  New_Cases  Country      State
0  Autauga  2020-01-29     0             0        US        Alabama
1  Autauga  2020-01-30     3             3        US        Alabama
2  Autauga  2020-01-31     5             2        US        Alabama
3  Weston   2020-01-01     0             0        US        Wyoming
4  Weston   2020-01-02     2             2        US        Wyoming
5  Weston   2020-01-03     2             0        US        Wyoming

score 2 · Accepted Answer · edited Dec 13 '20 at 07:37

2

You could try using shift here:

df['New_Cases'] = df.Total_Cases - df.groupby(['City', 'State'])['Total_Cases'].shift(1).fillna(0)

edited Dec 13 '20 at 07:37

user3813620

352
2
8

answered Dec 13 '20 at 04:56

Tim Biegeleisen

502,043
27
286
360

cool........... – ksha Dec 13 '20 at 04:58
1

I think it should be more like ``df.Total_Cases - df.groupby(["City", "State"])["Total_Cases"].shift(1).fillna(0)`` assuming the date is sorted as well – sammywemmy Dec 13 '20 at 05:00
Tim Biegeleisen your code was missing the part before the minus sign which @sammywemmy added. Thank you guys so much for your help. Also another community member has referenced a similar solution in another post using diff() function – user3813620 Dec 13 '20 at 07:28

How can I add a calculated column to a Pandas dataframe?

1 Answers1