2

My goal is to convert below - Current Data Output

enter image description here

By adding a new column called Cases which is non-cumulative for further analysis.

I have tried this solution but I am not sure what is my mistake -

Data["Cases"] = Data.groupby(level=0).diff().fillna(Data).reset_index()

Please advise

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
CHATSHD
  • 25
  • 3
  • @EdChum answered a similar question but i couldn't adapt - https://stackoverflow.com/questions/36452024/python-pandas-unroll-remove-cumulative-sum – CHATSHD Apr 06 '20 at 19:19
  • Also now trying Data['Cases'] = Data.groupby(level=0).diff().fillna('Cumulative_Cases').reset_index() – CHATSHD Apr 06 '20 at 19:20

1 Answers1

2

A simple approach will be to take the lag of values in cumulative column and then calculating the difference between cumulative and lag.

import pandas as pd

date = ['2020-03-01','2020-03-02','2020-03-03','2020-03-04','2020-03-05'] 
cases = [1, 2, 4, 7, 10]
data = list(zip(date, cases))

df = pd.DataFrame(data, columns=['Date','Cases'])
df['Lag'] = df.Cases.shift(1).fillna(0)
df['Daily Cases'] = df.Cases - df.Lag

print(df)
         Date  Cases  Lag  Daily Cases
0  2020-03-01      1  0.0          1.0
1  2020-03-02      2  1.0          1.0
2  2020-03-03      4  2.0          2.0
3  2020-03-04      7  4.0          3.0
4  2020-03-05     10  7.0          3.0

Just remember that order is important here. So sort your cumulative column in ascending order.

Don't forget to mark this as accepted answer if it helps you