how can I build a list of start / stop segments from a dataframe, with pandas?

Question

I have a dataframe with data like this:

datetime    my_value
2020-01-01   0
2020-01-02   0
2020-01-03   0
2020-01-04   1
2020-01-05   1
2020-01-06   2
2020-01-07   2
2020-01-08   2
2020-01-09   2
2020-01-10   3
2020-01-11   3

and I would like to find the time where each my_value group starts and stops

For example the '0' chunk starts at index 0 and ends at index 2, the '1' chunk starts at 3 and finished at index 4 (but I need the datetime data).

The datetime column is an index and each row is unique. The values in my_value are sequential ints that represent different areas, of various lengths, in my data. How can this be done with Pandas?

We need a better example dataframe than this. Couple questions: Is datetime your index or a column? Do the values stay the same as in your example? Do the datetimes stay the same? — Erfan, Aug 09 '20 at 20:40
Looks like you need `groupby` and `idxmin`, `idxmax`, but I can't work with your example dataframe so can't answer your question. Read more [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Erfan, Aug 09 '20 at 20:48
I've added proper dates to the dataframe, is that usable like this? — Thomas, Aug 09 '20 at 21:06

jsmart · Accepted Answer · 2020-08-09T20:55:44.123

0

The groupby() function will do this. First, create a data frame:

import pandas as pd
data = [0, 0, 0, 1, 1, 2, 2, 2, 2, 3, 3]
df = pd.DataFrame(
    {'datetime': pd.date_range(start='2020-01-01', periods=len(data), freq='D'),
     'my_value': data,})

print(df.head())
    datetime  my_value
0 2020-01-01         0
1 2020-01-02         0
2 2020-01-03         0
3 2020-01-04         1
4 2020-01-05         1

Then, reset_index and groupby:

t = df.reset_index().groupby('my_value')['index'].agg(['min', 'max'])
print(t)

          min  max
my_value          
0           0    2
1           3    4
2           5    8
3           9   10

edited Aug 09 '20 at 20:55

answered Aug 09 '20 at 20:48

jsmart

2,921
1
6
13

OP asked for the index, not min max value – Erfan Aug 09 '20 at 20:48
updated to address @Erfan's comment – jsmart Aug 09 '20 at 20:56
1

Use `df.reset_index().groupby('my_value')['datetime'].agg(['idxmin', 'idxmin'])` – Erfan Aug 09 '20 at 21:11
this works, thanks to you two! – Thomas Aug 09 '20 at 21:23

how can I build a list of start / stop segments from a dataframe, with pandas?

1 Answers1