1

Is there a way to get original column back from column which is a cumsum() of the original column?

For example:

df = pd.DataFrame({'Original': [1, 0, 0, 1, 0, 5, 0, np.NaN, np.NaN,4, 0, 0],
                   'CumSum': [1, 1, 1, 2, 2, 7, 7, np.NaN, np.NaN, 11, 11, 11]})

In the above example df, Is it possible to get original column just using the CumSum column?

In my original dataset, I have a column similar to CumSum column and I want to get the original. I tried to find an inbuilt function that can do but haven't found any.

mozway
  • 194,879
  • 13
  • 39
  • 75
nmp
  • 15
  • 3
  • 1
    Does this answer your question? [Python - Pandas - Unroll / Remove Cumulative Sum](https://stackoverflow.com/questions/36452024/python-pandas-unroll-remove-cumulative-sum) [this answer may or may not help with NaNs] – sj95126 Nov 08 '22 at 16:24
  • https://stackoverflow.com/questions/38666924/what-is-the-inverse-of-the-numpy-cumsum-function – Ian Thompson Nov 08 '22 at 16:25

1 Answers1

1

You can use:

df['Original2'] = (df['CumSum'].ffill().diff()
                   .mask(df['CumSum'].isna())
                   .fillna(df['CumSum'])
                  )

Output:

    Original  CumSum  Original2
0        1.0     1.0        1.0
1        0.0     1.0        0.0
2        0.0     1.0        0.0
3        1.0     2.0        1.0
4        0.0     2.0        0.0
5        5.0     7.0        5.0
6        0.0     7.0        0.0
7        NaN     NaN        NaN
8        NaN     NaN        NaN
9        4.0    11.0        4.0
10       0.0    11.0        0.0
11       0.0    11.0        0.0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks. It works. For a python newbie like me, this was helpful, and I appreciate it. Just FYI, If the cumsum column is like this [1,1,1,4,5,5,5,NaN,10, 0, 0, 1, 3, NaN, 3, 4, 4......] then the above code will work but at the point where the cumsum column restarts counts. The original2 column will then have a negative number. Fixed it using below code but just wanted to understand if there was a better way. `for row in df.itertuples(): if df.at[getattr(row, 'Index'),'original2']<0: df.at[getattr(row, 'Index'),'original2'] = df.at[getattr(row, 'Index'),'cumsum']` – nmp Nov 09 '22 at 07:07