I am trying to find a continuous (with strictly increasing values) path through a list of lists. I have tried various recursive and reversed approaches, but have failed for hours.
The problem stems from interval-based pattern mining. Here, each event has exact one start and end time. The problem looks like this:
time = [[11, 38, 40], [12, 39, 49], [41], [4, 23, 43], [47], [17, 35, 60]]
Events = [[start time events A], [end time events A], [start time events B], [start time events C], [end time events B], [end time events C]]
The goal is to find all possible routes through data where
time[i] < time[i+1]
holds.
A concrete example:
route_1 = [[11], [12], [41], [43], [47], [60]]
route_2 = [[38], [39], [41], [43], [47], [60]]
What is not valid:
route_3 = [[11], [39], [41], [43], [47], [60]]
because [11]
describes the start time of Event A[0]
, but [39]
represents the end time of Event A[1]
.
Can you name a suitable approach to solving this problem?
For my last (non-recursive) approach, I used a dictionary as a data representation. The approach produced the "closest" result to the expected result
time_reversed = {'end_C': [17, 35, 60], 'end_B': [47], 'start_C': [ 4, 23, 43], 'start_B': [41], 'end_A': [12, 39, 49], 'start_A': [11, 38, 40]}
import numpy as np
def consecutive(time):
time_pruned = time.copy()
keys = list(time.keys())
for i in range(len(keys)-1):
lower = time_pruned[keys[i+1]]
max_lower = np.nanmax(lower)
max_upper = np.nanmax(time_pruned[keys[i]])
while max_lower > max_upper:
lower = np.delete(lower, np.nanargmax(lower))
max_lower = np.nanmax(lower)
time_pruned[keys[i+1]] = lower
return time_pruned
However, this approach is not valid, since it does not consider the event-affiliation and do not now at the moment, how to consider everything in an efficient way.
The function above yields:
consecutive(time_reversed)
>>>{'end_C': [17, 35, 60],
'end_B': [47],
'start_C': [4, 23, 43],
'start_B': [41],
'end_A': array([12, 39]),
'start_A': array([11, 38])}
Update 1: I've tried to describe the approach more detailed. I also tried to bring it into code but failed while deleting elements of the list while iterating over it.
Start position for an element wise comparison is
i = 0
j = 0
l = data_pruned[keys[i]][j] = 11
r = data_pruned[keys[i+1]][j] = 12
Whereby check()
is defined as l < r
Iteration 1:
11 -> 12 -> 41 -> 4 (check() = False) remove 4 from start_C and 17 from end_C since each event consists of exact one start and end time
data_pruned = { 'start_A': [11, 38, 40],
'end_A': [12, 39, 49],
'start_B': [41],
'start_C': [23, 43],
'end_B': [47],
'end_C': [35, 60]}
Iteration 2:
11 -> 12 -> 41 -> 23 (check() = False) remove 23 from start_C and 35 from end_C since each event consists of exact one start and end time
data_pruned = { 'start_A': [11, 38, 40],
'end_A': [12, 39, 49],
'start_B': [41],
'start_C': [43],
'end_B': [47],
'end_C': [60]}
Iteration 3:
11 -> 12 -> 41 -> 43 -> 47 -> 60 --> Possible route since each value is strictly increasing
-> Do we have other possible strictly increasing combinations/routes/paths in data_pruned
?
Iteration 4:
38 -> 39 -> 41 -> 47 -> 60 --> Possible route since each value is strictly increasing
-> Do we have other possible strictly increasing combinations/routes/paths in data_pruned
?
Iteration 5:
38 -> 39 -> 41 -> 47 -> 60 --> Possible route since each value is strictly increasing
40 -> 49 -> 41 (check() = False) remove 49 from end_a and 40 from start_A since each event consists of exact one start and end time
data_pruned = { 'start_A': [11, 38],
'end_A': [12, 39],
'start_B': [41],
'start_C': [43],
'end_B': [47],
'end_C': [60]}
Do we have other possible strictly increasing combinations/routes/paths in data_pruned
?
Done since all possibilities checked
Update 2: I export the data from database into different data representations such as pandas.
import pandas as pd
from io import StringIO
data = """\
event,occur,start,end
A,1,11,12
A,2,38,39
A,3,40,49
B,1,41,47
C,1,4,17
C,2,23,35
C,3,43,60
"""
# Read the data into a DataFrame
df = pd.read_csv(StringIO(data))