I am interested to find out if:
a sequence of strings or numbers is contained as it is in a bigger or larger sequence of strings or numbers.
Following is a pandas dataframe with two columns: Id and Time. This dataframe is sorted beforehand by the values of Time.
import pandas as pd
label1 = ['422','422','422','428','428','453','453','453','453','453','421','421','421','421','421','422','422','422','424','424','424']
label2 = ['13:08','13:08','13:09','13:12','13:12','13:16','13:16','13:17','13:17','13:18','13:20','13:20','13:20','13:20','13:22', '13:23','13:24','13:24', '13:25','13:25','13:26']
d = {'Id':label1,'Time':label2}
df=pd.DataFrame(d)
df
The output df
looks like the following:
In [4]: df
Out[4]:
Id Time
0 422 13:08
1 422 13:08
2 422 13:09
3 428 13:12
4 428 13:12
5 453 13:16
6 453 13:16
7 453 13:17
8 453 13:17
9 453 13:18
10 421 13:20
11 421 13:20
12 421 13:20
13 421 13:20
14 421 13:22
15 422 13:23
16 422 13:24
17 422 13:24
18 424 13:25
19 424 13:25
20 424 13:26
What I have done so far. I have tried to generate a smaller dataframe as follows:
df["Id"] = df['Id'].astype('int')
bb1= df[df['Id'].diff(-1).ne(0)]
bb1
which has produced the following output:
In [59]: bb1
Out[59]:
Id Time
2 422 13:09
4 428 13:12
9 453 13:18
14 421 13:22
17 422 13:24
20 424 13:26
bb
dataframe contains the ids in the order they have appeared. They appear in the following order: S1=[422, 428,453,421,422,424].
Besides, there is a given sub-sequence which is S2=[421,422,424], which happens to be contained in S1.
I need to find if bb
dataframe contain a sub-sequence of Ids as reflected in S2=[421, 422, 424]. The answer to which should be returned with the following output if the sub-sequence gets identified:
index Id Time
10 421 13:20
14 421 13:22
15 422 13:23
17 422 13:24
18 424 13:25
20 424 13:26
The desired output contains the first and last time stamp and its associated index.
I would really appreciate your help.