I have the following dataframe:
Disease HeartRate State MonthStart MonthEnd
Covid 89 Texas 2020-02-28 2020-03-31
Covid 91 Texas 2020-03-31 2020-04-30
Covid 87 Texas 2020-07-31 2020-08-30
Cancer 90 Texas 2020-02-28 2020-03-31
Cancer 88 Florida 2020-03-31 2020-04-30
Covid 89 Florida 2020-02-28 2020-03-31
Covid 87 Florida 2020-03-31 2020-04-30
Flu 90 Florida 2020-02-28 2020-03-31
I have to subtract ‘previous row’ from the ‘current row’ in the ‘Heart’ column and create a new one.
However, there are some conditions:
- Row values will be subtracted only when the 'Disease' and 'State' columns have the same values.
- Row values will be subtracted only when the rows are in consecutive month. If there is a break in timeline, values won't be subtracted.
- If there is no previous row values to subtract, then put the 'HeartRate' value only.
Desired output:
Disease HeartRate State MonthStart MonthEnd HeartRateDiff
Covid 89 Texas 2020-02-28 2020-03-31 89
Covid 91 Texas 2020-03-31 2020-04-30 2
Covid 87 Texas 2020-07-31 2020-08-30 87
Cancer 90 Texas 2020-02-28 2020-03-31 90
Cancer 88 Florida 2020-03-31 2020-04-30 88
Covid 89 Florida 2020-02-28 2020-03-31 89
Covid 87 Florida 2020-03-31 2020-04-30 -2
Flu 90 Florida 2020-02-28 2020-03-31 90
I know how to subtract previous row from the current row using the following code:
df[‘DiffHeartRate’] = df.groupby(['Disease', 'State'])['HeartRate'].transform(pd.Series.diff)
However, I’m facing two problems:
- Keeping the same value if there is no previous row to subtract.
- Checking the continuity of timeline (next month or not).
Is there a smarter way of doing it? Any help would be appreciated. Thanks!