1

I have the following data frame:

              timestamp   col_A     col_B    col_C
0   2016-02-15 00:00:00     2.0     NaN        NaN  
1   2016-02-15 00:01:00     1.0     NaN        NaN
2   2016-02-15 00:02:00     4.0     2.0        NaN  
3   2016-02-15 00:03:00     2.0     2.0        NaN  
4   2016-02-15 00:04:00     7.0     4.1        1.0
5   2016-02-15 00:05:00     2.0     5.0        2.0
6   2016-02-15 00:06:00     2.4     2.0        7.5
7   2016-02-15 00:07:00     2.0     6.3        1.2
8   2016-02-15 00:08:00     2.5     7.0        NaN

I want to find the cumulated sum of non-NaN records at each timestamp for each column. That is, the expected output data frame should be:

              timestamp   col_A     col_B    col_C
0   2016-02-15 00:00:00     1       NaN        NaN  
1   2016-02-15 00:01:00     2       NaN        NaN
2   2016-02-15 00:02:00     3       1          NaN  
3   2016-02-15 00:03:00     4       2          NaN  
4   2016-02-15 00:04:00     5       3          1
5   2016-02-15 00:05:00     6       4          2
6   2016-02-15 00:06:00     7       5          3
7   2016-02-15 00:07:00     8       6          4
8   2016-02-15 00:08:00     9       7          NaN

I am looping over the data frame and find the cumsum record by record. However, I am wondering is there a more elegant to do this? Thanks!

Edamame
  • 23,718
  • 73
  • 186
  • 320
  • Possible duplicate of [How can I replace all the NaN values with Zero's in a column of a pandas dataframe](https://stackoverflow.com/questions/13295735/how-can-i-replace-all-the-nan-values-with-zeros-in-a-column-of-a-pandas-datafra) – Richard Inglis May 20 '18 at 00:49
  • Not duplicated, but what have you tried so far? – Pedro Lobito May 20 '18 at 00:52

2 Answers2

4

Using notnull + cumsum, notice , np.nan is type float so making all int number to float.

df.iloc[:,1:]=df.iloc[:,1:].notnull().cumsum()[df.iloc[:,1:].notnull()]
df
Out[33]: 
            timestamp  col_A  col_B  col_C
0  2016-02-1500:00:00      1    NaN    NaN
1  2016-02-1500:01:00      2    NaN    NaN
2  2016-02-1500:02:00      3    1.0    NaN
3  2016-02-1500:03:00      4    2.0    NaN
4  2016-02-1500:04:00      5    3.0    1.0
5  2016-02-1500:05:00      6    4.0    2.0
6  2016-02-1500:06:00      7    5.0    3.0
7  2016-02-1500:07:00      8    6.0    4.0
8  2016-02-1500:08:00      9    7.0    NaN
BENY
  • 317,841
  • 20
  • 164
  • 234
1

Inline with where

df.assign(**(lambda d: d.cumsum().where(d))(df.drop('timestamp', 1).notna()))

             timestamp  col_A  col_B  col_C
0  2016-02-15 00:00:00      1    NaN    NaN
1  2016-02-15 00:01:00      2    NaN    NaN
2  2016-02-15 00:02:00      3    1.0    NaN
3  2016-02-15 00:03:00      4    2.0    NaN
4  2016-02-15 00:04:00      5    3.0    1.0
5  2016-02-15 00:05:00      6    4.0    2.0
6  2016-02-15 00:06:00      7    5.0    3.0
7  2016-02-15 00:07:00      8    6.0    4.0
8  2016-02-15 00:08:00      9    7.0    NaN

Inplace with update

df.update((lambda d: d.cumsum().where(d))(df.drop('timestamp', 1).notna()))
df

             timestamp  col_A  col_B  col_C
0  2016-02-15 00:00:00      1    NaN    NaN
1  2016-02-15 00:01:00      2    NaN    NaN
2  2016-02-15 00:02:00      3    1.0    NaN
3  2016-02-15 00:03:00      4    2.0    NaN
4  2016-02-15 00:04:00      5    3.0    1.0
5  2016-02-15 00:05:00      6    4.0    2.0
6  2016-02-15 00:06:00      7    5.0    3.0
7  2016-02-15 00:07:00      8    6.0    4.0
8  2016-02-15 00:08:00      9    7.0    NaN

Details

d = df.drop('timestamp', 1).notna()
d.cumsum().where(d)

   col_A  col_B  col_C
0      1    NaN    NaN
1      2    NaN    NaN
2      3    1.0    NaN
3      4    2.0    NaN
4      5    3.0    1.0
5      6    4.0    2.0
6      7    5.0    3.0
7      8    6.0    4.0
8      9    7.0    NaN
piRSquared
  • 285,575
  • 57
  • 475
  • 624