0

I was just curious how to find previous rows values if we know next rows values that is just cummulative sum of rows. For example here new deaths here just added to total deaths which is just cummlative sum of new-deaths. How to find now missing value in data set? we can find by just subtracting but is ther any programmatic way ?

date        total_cases   new_cases    total_deaths   new_deaths    population  lockdown_date
2020-29-04  1012583.0   24132.0         58355.0           2110.0    54225.446       2020-03-13
2020-04-30  1039909.0   27326.0         60966.0           2611.0    54225.446       2020-03-13
2020-05-01  1069826.0   29917.0         NaN                  NaN    54225.446       2020-03-13
2020-05-02  1103781.0   33955.0         65068.0           2062.0    54225.446       2020-03-13
2020-05-03  1133069.0   29288.0         66385.0           1317.0    54225.446       2020-03-13
lucky
  • 25
  • 3
  • Does this help you https://stackoverflow.com/questions/22081878/get-previous-rows-value-and-calculate-new-column-pandas-python – Joe Ferndz Sep 02 '20 at 03:48
  • Or you can also try https://stackoverflow.com/questions/34855859/is-there-a-way-in-pandas-to-use-previous-row-value-in-dataframe-apply-when-previ – Joe Ferndz Sep 02 '20 at 03:56

1 Answers1

1

You can use shift to realign subtracted column in combination with fillna to fill missing cumulated values, then diff to retrieve the new cases:

from io import StringIO

import pandas as pd

txt = """
date        total_cases   new_cases    total_deaths   new_deaths    population  lockdown_date
2020-29-04  1012583.0   24132.0         58355.0           2110.0    54225.446       2020-03-13
2020-04-30  1039909.0   27326.0         60966.0           2611.0    54225.446       2020-03-13
2020-05-01  1069826.0   29917.0         NaN                  NaN    54225.446       2020-03-13
2020-05-02  1103781.0   33955.0         65068.0           2062.0    54225.446       2020-03-13
2020-05-03  1133069.0   29288.0         66385.0           1317.0    54225.446       2020-03-13
"""

df = pd.read_csv(StringIO(txt), sep="\s+")

df_filled = df.assign(
    total_deaths=lambda f: f["total_deaths"].fillna(
        f["total_deaths"].sub(f["new_deaths"]).shift(-1)
    ),
    new_deaths=lambda f: f["new_deaths"].fillna(f["total_deaths"].diff()),
)
print(df_filled)
         date  total_cases  new_cases  total_deaths  new_deaths  population  \
0  2020-29-04    1012583.0    24132.0       58355.0      2110.0   54225.446   
1  2020-04-30    1039909.0    27326.0       60966.0      2611.0   54225.446   
2  2020-05-01    1069826.0    29917.0       63006.0      2040.0   54225.446   
3  2020-05-02    1103781.0    33955.0       65068.0      2062.0   54225.446   
4  2020-05-03    1133069.0    29288.0       66385.0      1317.0   54225.446   

  lockdown_date  
0    2020-03-13  
1    2020-03-13  
2    2020-03-13  
3    2020-03-13  
4    2020-03-13
Horace
  • 1,024
  • 7
  • 12