1

Is there a way to add up current row value to the sum of subsequent values (relatively to the date corresponding to the row) in pandas ?

I'd like to take the YTD of the corresponding row and add the sum of all the remaining Budget Values for 2019. Let's suppose we are in the 4th month of 2019. For example, for row 0, I'd like to have 101 + the sum of the subsequent Values that are under "Budget" and "2019". For row 1, the same logic would apply (199 + the sum of the subsequent values), etc...

My current table is like this :

    Value    Type   Date     YTD    YEP (year in projection)
0     100  Budget 2019-01-01  101                       NaN
1      50  Budget 2019-02-01  199                       NaN
2      20  Budget 2019-03-01  275                       NaN
3     123  Budget 2019-04-01  332                       NaN
4      56  Budget 2019-05-01  332                       NaN
5      76  Budget 2019-06-01  332                       NaN
6      98  Budget 2019-07-01  332                       NaN
7     126  Budget 2019-08-01  332                       NaN
8      90  Budget 2019-09-01  332                       NaN
9      80  Budget 2019-10-01  332                       NaN
10     67  Budget 2019-11-01  332                       NaN
11     87  Budget 2019-12-01  332                       NaN
12    101  Actual 2019-01-01  101                       NaN
13     98  Actual 2019-02-01  199                       NaN
14     76  Actual 2019-03-01  275                       NaN
15     57  Actual 2019-04-01  332                       NaN

Desired table :

     Value    Type   Date     YTD    YEP (year in projection)
0     100  Budget 2019-01-01  101                       974
1      50  Budget 2019-02-01  199                       1022
2      20  Budget 2019-03-01  275                       1078
3     123  Budget 2019-04-01  332                       1012
4      56  Budget 2019-05-01  NaN                       NaN
5      76  Budget 2019-06-01  NaN                       NaN
6      98  Budget 2019-07-01  NaN                       NaN
7     126  Budget 2019-08-01  NaN                       NaN
8      90  Budget 2019-09-01  NaN                       NaN
9      80  Budget 2019-10-01  NaN                       NaN
10     67  Budget 2019-11-01  NaN                       NaN
11     87  Budget 2019-12-01  NaN                       NaN
12    101  Actual 2019-01-01  101                       974
13     98  Actual 2019-02-01  199                       1022
14     76  Actual 2019-03-01  275                       1078
15     57  Actual 2019-04-01  332                       1012

Here are Excel screencaps to grasp better the calculation I'm talking about :

screencap1 screencap2

This Excel screencap shows well what I want to do, even though it's not rigorously the same thing (since I don't want to visually delimit the area to sum, whereas here with pandas I want to set conditions). Note that I know how to set conditions on Python, but here the problem is deeper, and that's precisely why i ask you people help. Is there a function to say "hey I want you to take the sum of the batch of numbers, but always starting from to where you are positioned" (that's what relative position and dollars on Excel allow us to do).

Thank you !

Alex

alexnesov
  • 125
  • 2
  • 6

1 Answers1

1

We can use GroupBy.cumsum by inverting the DataFrame previously using [::-1].

df['Date'] = pd.to_datetime(df['Date'])
df['YEP'] = ( df[::-1].loc[df['Type'].eq('Budget')]
                      .groupby(df['Date'].dt.year)
                      .Value
                      .cumsum()
                      .sub(df['Value'])
                      .add(df['YTD'])
                      .groupby(df['Date'])
                      .transform('first') )
print(df)

    Value    Type       Date  YTD     YEP
0     100  Budget 2019-01-01  101   974.0
1      50  Budget 2019-02-01  199  1022.0
2      20  Budget 2019-03-01  275  1078.0
3     123  Budget 2019-04-01  332  1012.0
4      56  Budget 2019-05-01  332   956.0
5      76  Budget 2019-06-01  332   880.0
6      98  Budget 2019-07-01  332   782.0
7     126  Budget 2019-08-01  332   656.0
8      90  Budget 2019-09-01  332   566.0
9      80  Budget 2019-10-01  332   486.0
10     67  Budget 2019-11-01  332   419.0
11     87  Budget 2019-12-01  332   332.0
12    101  Actual 2019-01-01  101   974.0
13     98  Actual 2019-02-01  199  1022.0
14     76  Actual 2019-03-01  275  1078.0
15     57  Actual 2019-04-01  332  1012.0

Then we can use DataFrame.mask to mask when there are repeated values:

df[['YTD','YEP']] = df[['YTD','YEP']].mask(df.assign(year = df['Date'].dt.year)
                                             .duplicated(['Type','YTD','year']))


#df[['YTD','YEP']] = df[['YTD','YEP']].mask(df.duplicated(['Type','YTD']))
print(df)

    Value    Type       Date    YTD     YEP
0     100  Budget 2019-01-01  101.0   974.0
1      50  Budget 2019-02-01  199.0  1022.0
2      20  Budget 2019-03-01  275.0  1078.0
3     123  Budget 2019-04-01  332.0  1012.0
4      56  Budget 2019-05-01    NaN     NaN
5      76  Budget 2019-06-01    NaN     NaN
6      98  Budget 2019-07-01    NaN     NaN
7     126  Budget 2019-08-01    NaN     NaN
8      90  Budget 2019-09-01    NaN     NaN
9      80  Budget 2019-10-01    NaN     NaN
10     67  Budget 2019-11-01    NaN     NaN
11     87  Budget 2019-12-01    NaN     NaN
12    101  Actual 2019-01-01  101.0   974.0
13     98  Actual 2019-02-01  199.0  1022.0
14     76  Actual 2019-03-01  275.0  1078.0
15     57  Actual 2019-04-01  332.0  1012.0

Please note that this operation is carried out for each year, although this dataframe only shows 2019

ansev
  • 30,322
  • 5
  • 17
  • 31