4

I have a dataframe that I have made by concatenating several dataframes containing various bits of information. I am then creating new columns in the dataframe with the result of operations upon the columns of data. one of the columns ['MSD'] contains a NaN at the junction between each of the dataframes that contributed to the concatenation.

I wish to create a new column MSD_cum where each value is the cumulative sum of MSD but I want the cumulative sum to begin anew from each NaN`.

There is a question about this already but although the code in the answers given to it execute without errors, none of them actually do what I want ie.

MSD 
1
2
3
4
NaN
1
2
3

would create output

MSD_cum
1
3
6
10
NaN
1
3
6

Instead, they yield either the same as the input or
my own initial attempts which also failed and yielded the same output as the input, attempted to use an if statement to make the cumsum conditional upon the difference between the data indices.

if  np.diff(data_230['time']) is 1:
data_230['MSD_cum']  = cumsum(data_230['MSD'])
else:
    data_230['MSD_cum'] = data_230['MSD']

I would be happy with any approach at this point.

Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53

1 Answers1

4

Use Series.isna with Series.cumsum to create a grouping series s, this will be needed to group the column MSD so that we can calculate the cumsum which resets at every occurence of NaN in MSD, next use Series.groupby to group the column MSD on s along with cumsum:

s = df['MSD'].isna().cumsum()
df['MSD_cum'] = df['MSD'].groupby(s).cumsum()

Details:

# print(s)   
0    0
1    0
2    0
3    0
4    1
5    1
6    1
7    1
Name: MSD, dtype: int64

# print(df)
   MSD  MSD_cum
0  1.0      1.0
1  2.0      3.0
2  3.0      6.0
3  4.0     10.0
4  NaN      NaN
5  1.0      1.0
6  2.0      3.0
7  3.0      6.0
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53