I have some code which produces a dataframe output, of columns date, and x (a given value). df=
index date colx
2018-08-09 NaN NaN
2018-08-10 2018-08-10 00:00:00 -0.200460
2018-08-13 NaN NaN
2018-08-14 NaN NaN
2018-08-15 NaN NaN
2018-08-16 NaN NaN
2018-08-17 NaN NaN
2018-08-20 NaN NaN
2018-08-21 NaN NaN
2018-08-22 2018-08-22 00:00:00 -2.317475
2018-08-23 2018-08-23 00:00:00 -1.652724
2018-08-24 2018-08-24 00:00:00 -3.669870
2018-08-27 2018-08-27 00:00:00 -3.807074
2018-08-28 2018-08-28 00:00:00 -0.257006
2018-08-29 NaN NaN
2018-08-30 2018-08-30 00:00:00 -0.374825
2018-08-31 2018-08-31 00:00:00 -5.655345
2018-09-03 2018-09-03 00:00:00 -4.631105
2018-09-04 2018-09-04 00:00:00 -4.722768
2018-09-05 2018-09-05 00:00:00 -3.012673
2018-09-06 NaN NaN
Date column is the same as the index, for selected values, and np.nan for other sections.
What I am looking to achieve and unsure as to how, is to extract the first date and last date of a block of data (without the 00:00:00)
With the help of the following link I am able to tackle the issue of cumsum
but not the extraction of the data into the required output below:
python pandas conditional cumulative sum
b = df.colx
c = b.cumsum()
df['cumsumcolx']=c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(float)
This code gives me:
index date colx cumsumcolx
2018-08-09 0 0 0
2018-08-10 2018-08-10 00:00:00 -0.200460 -0.200460
2018-08-13 0 0 0
2018-08-14 0 0 0
2018-08-15 0 0 0
2018-08-16 0 0 0
2018-08-17 0 0 0
2018-08-20 0 0 0
2018-08-21 0 0 0
2018-08-22 2018-08-22 00:00:00 -2.317475 -2.317475
2018-08-23 2018-08-23 00:00:00 -1.652724 -3.970198
2018-08-24 2018-08-24 00:00:00 -3.669870 -7.640069
2018-08-27 2018-08-27 00:00:00 -3.807074 -11.447143
2018-08-28 2018-08-28 00:00:00 -0.257006 -11.704148
2018-08-29 0 0 0
2018-08-30 2018-08-30 00:00:00 -0.374825 -0.374825
2018-08-31 2018-08-31 00:00:00 -5.655345 -6.030169
2018-09-03 2018-09-03 00:00:00 -4.631105 -10.661275
2018-09-04 2018-09-04 00:00:00 -4.722768 -15.384043
2018-09-05 2018-09-05 00:00:00 -3.012673 -18.396715
2018-09-06 0 0 0
Thus, im asking for help with extraction so that i achieve an expected output of a table/dataframe:
entrydate exitdate cumsumcolx
2018-08-10 2018-08-10 -0.200460
2018-08-22 2018-08-28 -11.704148
2018-08-30 2018-09-05 -18.396715
my df is very long, thus just taken a snippet of it for illustration purposes.
Thank you