2

It's my first time asking a question here, so I hope I will do it right !

I have a Pandas Dataframe:

df2.data
Out[66]: 
date
2016-01-02    0.0
2016-01-03    1.0
2016-01-04    1.0
2016-01-05    1.0
2016-01-06    0.0
2016-01-07    0.0
2016-01-08    1.0
2016-01-09    2.0
2016-01-10    1.0
2016-01-11    0.0
Name: data, dtype: float64

I would like the following results :

            data  trend  trend_type
date                               
2016-01-02   0.0      0           0
2016-01-03   1.0      0           0
2016-01-04   1.0      1           1
2016-01-05   1.0      2           1
2016-01-06   0.0      0           0
2016-01-07   0.0      1           0
2016-01-08   1.0      0           0
2016-01-09   2.0      0           0
2016-01-10   1.0      0           0
2016-01-11   0.0      0           0

My probleme is a bit related with How to use pandas to find consecutive same data in time series.

So far, I managed to get the trend, but it is not efficient enough (about 8 sec for a 750 rows dataframe)

df['grp'] = (df.close.diff(1) == 0).astype('int')
df['trend'] = 0
start_time = time.time()
for i in range(2, len(df['grp'])):
    if df.grp.iloc[i] == 1:
        df['trend'].iloc[i] = df['trend'].iloc[i-1] + 1 
Garf365
  • 3,619
  • 5
  • 29
  • 41
Roger
  • 407
  • 1
  • 4
  • 16
  • What is `trend_type`? Can you explain how you get these results? – cs95 Dec 13 '17 at 13:30
  • To get the trend, you can very easily do a groupby - `df.date.groupby(df.date.ne(df.date.shift()).cumsum()).cumcount()` – cs95 Dec 13 '17 at 13:32
  • If you can explain what `trend_type` is, that would be great. – cs95 Dec 13 '17 at 13:32
  • Hello. Sorry it wasnt clear enough. Trend type is whether the trend (serie of conscutive data) is a 0, 1 or 2. By default it is a 0. I wrote those results. – Roger Dec 13 '17 at 13:33
  • It's still not obvious how you calculate it. Your data and your explanation do not add up.. – cs95 Dec 13 '17 at 13:35
  • Going by your explanation, I should have `0, 0, 1, 1, 0, 1, 0, 0, 0, 0`. – cs95 Dec 13 '17 at 13:36
  • For a given date, I would like to know if there are consecutive identical data, if there is, I would like to know for how many days and what was the data. – Roger Dec 13 '17 at 13:38
  • Okay, in your data, all dates are distinct, so shouldn't everything be 0 in `trend_type`? – cs95 Dec 13 '17 at 13:40
  • I understand, so since on days 2016-01-06 and 2016-01-07 the data is '0', the trend type is 0. If the data were 1, the trend type would be 1, and if the data were 2, the trend type 2 – Roger Dec 13 '17 at 13:41
  • Alright, see if my answer is what you're looking for. – cs95 Dec 13 '17 at 13:45

2 Answers2

1

Step 1
To get trend, perform a groupby + cumcount -

df['trend'] = df.data.groupby(df.data.ne(df.data.shift()).cumsum()).cumcount()
df

            data  trend
2016-01-02   0.0      0
2016-01-03   1.0      0
2016-01-04   1.0      1
2016-01-05   1.0      2
2016-01-06   0.0      0
2016-01-07   0.0      1
2016-01-08   1.0      0
2016-01-09   2.0      0
2016-01-10   1.0      0
2016-01-11   0.0      0

Step 2
(IIUC), to get trend_type, compare consecutive rows and assign.

df['trend_type'] = 0
m = df.data.eq(df.data.shift())
df.loc[m, 'trend_type'] = df.loc[m, 'data']

df

            data  trend  trend_type
2016-01-02   0.0      0         0.0
2016-01-03   1.0      0         0.0
2016-01-04   1.0      1         1.0
2016-01-05   1.0      2         1.0
2016-01-06   0.0      0         0.0
2016-01-07   0.0      1         0.0
2016-01-08   1.0      0         0.0
2016-01-09   2.0      0         0.0
2016-01-10   1.0      0         0.0
2016-01-11   0.0      0         0.0
cs95
  • 379,657
  • 97
  • 704
  • 746
  • That is exactly what I was looking for ! Thank you very much for your help ! And the execution is really fast, that's perfect ! I wish you a very nice day, you just made mine ! – Roger Dec 13 '17 at 13:51
0

Edit, add column "trep_type"

df.loc[0, "trend"] = 0
df.loc[0, "trend_type"] = 0

for nrow in range(df.shape[0]-1):

    if df.loc[nrow+1, 1] == df.loc[nrow, 1]:
        df.loc[nrow+1, "trend"] = df.loc[nrow, "trend"]+1
        df.loc[nrow + 1, "trend_type"] = 1
    else:
        df.loc[nrow + 1, "trend"] = 0
        df.loc[nrow + 1, "trend_type"] = 0
Manuel
  • 698
  • 4
  • 8
  • 1
    thanks for the intereset on my question Manuel ! I think your solution is a bit like my first attempt, the problem is the time of execution – Roger Dec 13 '17 at 14:03