Pandas: find the cumulated sum of non-NaN records at each timestamp for each column

Question

I have the following data frame:

              timestamp   col_A     col_B    col_C
0   2016-02-15 00:00:00     2.0     NaN        NaN  
1   2016-02-15 00:01:00     1.0     NaN        NaN
2   2016-02-15 00:02:00     4.0     2.0        NaN  
3   2016-02-15 00:03:00     2.0     2.0        NaN  
4   2016-02-15 00:04:00     7.0     4.1        1.0
5   2016-02-15 00:05:00     2.0     5.0        2.0
6   2016-02-15 00:06:00     2.4     2.0        7.5
7   2016-02-15 00:07:00     2.0     6.3        1.2
8   2016-02-15 00:08:00     2.5     7.0        NaN

I want to find the cumulated sum of non-NaN records at each timestamp for each column. That is, the expected output data frame should be:

              timestamp   col_A     col_B    col_C
0   2016-02-15 00:00:00     1       NaN        NaN  
1   2016-02-15 00:01:00     2       NaN        NaN
2   2016-02-15 00:02:00     3       1          NaN  
3   2016-02-15 00:03:00     4       2          NaN  
4   2016-02-15 00:04:00     5       3          1
5   2016-02-15 00:05:00     6       4          2
6   2016-02-15 00:06:00     7       5          3
7   2016-02-15 00:07:00     8       6          4
8   2016-02-15 00:08:00     9       7          NaN

I am looping over the data frame and find the cumsum record by record. However, I am wondering is there a more elegant to do this? Thanks!

Possible duplicate of [How can I replace all the NaN values with Zero's in a column of a pandas dataframe](https://stackoverflow.com/questions/13295735/how-can-i-replace-all-the-nan-values-with-zeros-in-a-column-of-a-pandas-datafra) — Richard Inglis, May 20 '18 at 00:49

score 4 · Accepted Answer · answered May 20 '18 at 00:53

Using notnull + cumsum, notice , np.nan is type float so making all int number to float.

df.iloc[:,1:]=df.iloc[:,1:].notnull().cumsum()[df.iloc[:,1:].notnull()]
df
Out[33]: 
            timestamp  col_A  col_B  col_C
0  2016-02-1500:00:00      1    NaN    NaN
1  2016-02-1500:01:00      2    NaN    NaN
2  2016-02-1500:02:00      3    1.0    NaN
3  2016-02-1500:03:00      4    2.0    NaN
4  2016-02-1500:04:00      5    3.0    1.0
5  2016-02-1500:05:00      6    4.0    2.0
6  2016-02-1500:06:00      7    5.0    3.0
7  2016-02-1500:07:00      8    6.0    4.0
8  2016-02-1500:08:00      9    7.0    NaN

@piRSquared is this will made col_C at the end become 4 ? – BENY May 20 '18 at 02:27 — BENY, May 20 '18 at 02:27

score 1 · Answer 2 · answered May 20 '18 at 05:16

Inline with `where`

df.assign(**(lambda d: d.cumsum().where(d))(df.drop('timestamp', 1).notna()))

             timestamp  col_A  col_B  col_C
0  2016-02-15 00:00:00      1    NaN    NaN
1  2016-02-15 00:01:00      2    NaN    NaN
2  2016-02-15 00:02:00      3    1.0    NaN
3  2016-02-15 00:03:00      4    2.0    NaN
4  2016-02-15 00:04:00      5    3.0    1.0
5  2016-02-15 00:05:00      6    4.0    2.0
6  2016-02-15 00:06:00      7    5.0    3.0
7  2016-02-15 00:07:00      8    6.0    4.0
8  2016-02-15 00:08:00      9    7.0    NaN

Inplace with `update`

df.update((lambda d: d.cumsum().where(d))(df.drop('timestamp', 1).notna()))
df

             timestamp  col_A  col_B  col_C
0  2016-02-15 00:00:00      1    NaN    NaN
1  2016-02-15 00:01:00      2    NaN    NaN
2  2016-02-15 00:02:00      3    1.0    NaN
3  2016-02-15 00:03:00      4    2.0    NaN
4  2016-02-15 00:04:00      5    3.0    1.0
5  2016-02-15 00:05:00      6    4.0    2.0
6  2016-02-15 00:06:00      7    5.0    3.0
7  2016-02-15 00:07:00      8    6.0    4.0
8  2016-02-15 00:08:00      9    7.0    NaN

Details

d = df.drop('timestamp', 1).notna()
d.cumsum().where(d)

   col_A  col_B  col_C
0      1    NaN    NaN
1      2    NaN    NaN
2      3    1.0    NaN
3      4    2.0    NaN
4      5    3.0    1.0
5      6    4.0    2.0
6      7    5.0    3.0
7      8    6.0    4.0
8      9    7.0    NaN

Pandas: find the cumulated sum of non-NaN records at each timestamp for each column

2 Answers2

Inline with where

Inplace with update

Details

Inline with `where`

Inplace with `update`