Count consecutives values in time series

Question

It's my first time asking a question here, so I hope I will do it right !

I have a Pandas Dataframe:

df2.data
Out[66]: 
date
2016-01-02    0.0
2016-01-03    1.0
2016-01-04    1.0
2016-01-05    1.0
2016-01-06    0.0
2016-01-07    0.0
2016-01-08    1.0
2016-01-09    2.0
2016-01-10    1.0
2016-01-11    0.0
Name: data, dtype: float64

I would like the following results :

            data  trend  trend_type
date                               
2016-01-02   0.0      0           0
2016-01-03   1.0      0           0
2016-01-04   1.0      1           1
2016-01-05   1.0      2           1
2016-01-06   0.0      0           0
2016-01-07   0.0      1           0
2016-01-08   1.0      0           0
2016-01-09   2.0      0           0
2016-01-10   1.0      0           0
2016-01-11   0.0      0           0

My probleme is a bit related with How to use pandas to find consecutive same data in time series.

So far, I managed to get the trend, but it is not efficient enough (about 8 sec for a 750 rows dataframe)

df['grp'] = (df.close.diff(1) == 0).astype('int')
df['trend'] = 0
start_time = time.time()
for i in range(2, len(df['grp'])):
    if df.grp.iloc[i] == 1:
        df['trend'].iloc[i] = df['trend'].iloc[i-1] + 1

What is `trend_type`? Can you explain how you get these results? — cs95, Dec 13 '17 at 13:30
To get the trend, you can very easily do a groupby - `df.date.groupby(df.date.ne(df.date.shift()).cumsum()).cumcount()` — cs95, Dec 13 '17 at 13:32
If you can explain what `trend_type` is, that would be great. — cs95, Dec 13 '17 at 13:32
Hello. Sorry it wasnt clear enough. Trend type is whether the trend (serie of conscutive data) is a 0, 1 or 2. By default it is a 0. I wrote those results. — Roger, Dec 13 '17 at 13:33
It's still not obvious how you calculate it. Your data and your explanation do not add up.. — cs95, Dec 13 '17 at 13:35
Going by your explanation, I should have `0, 0, 1, 1, 0, 1, 0, 0, 0, 0`. — cs95, Dec 13 '17 at 13:36
For a given date, I would like to know if there are consecutive identical data, if there is, I would like to know for how many days and what was the data. — Roger, Dec 13 '17 at 13:38
Okay, in your data, all dates are distinct, so shouldn't everything be 0 in `trend_type`? — cs95, Dec 13 '17 at 13:40
I understand, so since on days 2016-01-06 and 2016-01-07 the data is '0', the trend type is 0. If the data were 1, the trend type would be 1, and if the data were 2, the trend type 2 — Roger, Dec 13 '17 at 13:41

cs95 · Accepted Answer · 2017-12-13T14:40:29.573

Step 1
To get trend, perform a groupby + cumcount -

df['trend'] = df.data.groupby(df.data.ne(df.data.shift()).cumsum()).cumcount()
df

            data  trend
2016-01-02   0.0      0
2016-01-03   1.0      0
2016-01-04   1.0      1
2016-01-05   1.0      2
2016-01-06   0.0      0
2016-01-07   0.0      1
2016-01-08   1.0      0
2016-01-09   2.0      0
2016-01-10   1.0      0
2016-01-11   0.0      0

Step 2
(IIUC), to get trend_type, compare consecutive rows and assign.

df['trend_type'] = 0
m = df.data.eq(df.data.shift())
df.loc[m, 'trend_type'] = df.loc[m, 'data']

df

            data  trend  trend_type
2016-01-02   0.0      0         0.0
2016-01-03   1.0      0         0.0
2016-01-04   1.0      1         1.0
2016-01-05   1.0      2         1.0
2016-01-06   0.0      0         0.0
2016-01-07   0.0      1         0.0
2016-01-08   1.0      0         0.0
2016-01-09   2.0      0         0.0
2016-01-10   1.0      0         0.0
2016-01-11   0.0      0         0.0

That is exactly what I was looking for ! Thank you very much for your help ! And the execution is really fast, that's perfect ! I wish you a very nice day, you just made mine ! — Roger, Dec 13 '17 at 13:51

score 0 · Answer 2 · answered Dec 13 '17 at 13:51

0

Edit, add column "trep_type"

df.loc[0, "trend"] = 0
df.loc[0, "trend_type"] = 0

for nrow in range(df.shape[0]-1):

    if df.loc[nrow+1, 1] == df.loc[nrow, 1]:
        df.loc[nrow+1, "trend"] = df.loc[nrow, "trend"]+1
        df.loc[nrow + 1, "trend_type"] = 1
    else:
        df.loc[nrow + 1, "trend"] = 0
        df.loc[nrow + 1, "trend_type"] = 0

answered Dec 13 '17 at 13:51

Manuel

698
4
8

1

thanks for the intereset on my question Manuel ! I think your solution is a bit like my first attempt, the problem is the time of execution – Roger Dec 13 '17 at 14:03

Count consecutives values in time series

2 Answers2

Linked