-2

I have a dataframe with data like this:

datetime    my_value
2020-01-01   0
2020-01-02   0
2020-01-03   0
2020-01-04   1
2020-01-05   1
2020-01-06   2
2020-01-07   2
2020-01-08   2
2020-01-09   2
2020-01-10   3
2020-01-11   3

and I would like to find the time where each my_value group starts and stops

For example the '0' chunk starts at index 0 and ends at index 2, the '1' chunk starts at 3 and finished at index 4 (but I need the datetime data).

The datetime column is an index and each row is unique. The values in my_value are sequential ints that represent different areas, of various lengths, in my data. How can this be done with Pandas?

Thomas
  • 10,933
  • 14
  • 65
  • 136
  • We need a better example dataframe than this. Couple questions: Is datetime your index or a column? Do the values stay the same as in your example? Do the datetimes stay the same? – Erfan Aug 09 '20 at 20:40
  • @Erfan, I clarified the question. – Thomas Aug 09 '20 at 20:42
  • Looks like you need `groupby` and `idxmin`, `idxmax`, but I can't work with your example dataframe so can't answer your question. Read more [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Erfan Aug 09 '20 at 20:48
  • I've added proper dates to the dataframe, is that usable like this? – Thomas Aug 09 '20 at 21:06

1 Answers1

0

The groupby() function will do this. First, create a data frame:

import pandas as pd
data = [0, 0, 0, 1, 1, 2, 2, 2, 2, 3, 3]
df = pd.DataFrame(
    {'datetime': pd.date_range(start='2020-01-01', periods=len(data), freq='D'),
     'my_value': data,})

print(df.head())
    datetime  my_value
0 2020-01-01         0
1 2020-01-02         0
2 2020-01-03         0
3 2020-01-04         1
4 2020-01-05         1

Then, reset_index and groupby:

t = df.reset_index().groupby('my_value')['index'].agg(['min', 'max'])
print(t)

          min  max
my_value          
0           0    2
1           3    4
2           5    8
3           9   10
jsmart
  • 2,921
  • 1
  • 6
  • 13