Implementing a dynamic multiple timeline queue

Question

Introduction

I would like to implement a dynamic multiple timeline queue. The context here is scheduling in general.

What is a timeline queue?

This is still simple: It is a timeline of tasks, where each event has its start and end time. Tasks are grouped as jobs. This group of tasks need to preserve its order, but can be moved around in time as a whole. For example it could be represented as:

 --t1--   ---t2.1-----------t2.2-------
 '    '   '        '                  '
20    30  40       70                120

I would implement this as a heap queue with some additional constraints. The Python sched module has some basic approaches in this direction.

Definition multiple timeline queue

One queue stands for a resource and a resource is needed by a task. Graphical example:

R1  --t1.1----- --t2.2-----      -----t1.3--    
            /  \                /
R2  --t2.1--     ------t1.2-----

Explaining "dynamic"

It becomes interesting when a task can use one of multiple resources. An additional constraint is that consecutive tasks, which can run on the same resource, must use the same resource.

Example: If (from above) task t1.3 can run on R1 or R2, the queue should look like:

R1  --t1.1----- --t2.2-----      
            /  \                
R2  --t2.1--     ------t1.2----------t1.3--

Functionality (in priority order)

FirstFreeSlot(duration, start): Find the first free time slot beginning from start where there is free time for duration (see detailed explanation at the end).
Enqueue a job as earliest as possible on the multiple resources by regarding the constraints (mainly: correct order of tasks, consecutive tasks on same resource) and using FirstFreeSlot.
Put a job at a specific time and move the tail backwards
Delete a job
Recalculate: After delete, test if some tasks can be executed earlier.

Key Question

The point is: How can I represent this information to provide the functionality efficiently? Implementation is up to me ;-)

Update: A further point to consider: The typical interval structures have the focus on "What is at point X?" But in this case the enqueue and therefore the question "Where is the first empty slot for duration D?" is much more important. So a segment/interval tree or something else in this direction is probably not the right choice.

To elaborate the point with the free slots further: Due to the fact that we have multiple resources and the constraint of grouped tasks there can be free time slots on some resources. Simple example: t1.1 run on R1 for 40 and then t1.2 run on R2. So there is an empty interval of [0, 40] on R2 which can be filled by the next job.

Update 2: There is an interesting proposal in another SO question. If someone can port it to my problem and show that it is working for this case (especially elaborated to multiple resources), this would be probably a valid answer.

Can you give an example of what one of these tasks might be? — Daniel Kinsman, Jun 25 '12 at 06:11
@327 No :) I'm going to analyze/evaluate scheduling algorithms. So a task is just an abstract object with a duration and a set of possible resources to run on. — schlamar, Jun 25 '12 at 06:20
I still have some questions: 1. what is jobs: t1.1 grouped with t1.2 are jobs ? 2. why t1.3 can run on both R1 and R2? does that mean t1.2 can run on both R1 and R2? — zinking, Jun 26 '12 at 10:20
@zinking 1. `t1.x` is a group of tasks. 2. This is just an example, this is dynamic. 3. No. If `t1.2` could run on `R1` it would be scheduled there because of the group constraint. — schlamar, Jun 26 '12 at 10:28
You can't evaluate the data structure alone without knowing how you will use it. This makes your question about algorithms too. Pick an scheduling algorithm, and evaluate its time complexity against a naive data structure. Is the complexity determined by the algorithm's logic or the data structure? Only if the data structure limits complexity is it worth improving. — Colonel Panic, Jun 27 '12 at 12:54
@MattHickford I want to compare multiple algorithms. There will be always "sequencing" schedulers which pre-sort the jobs and just enqueue them (this is the reason why `enqueue` has highest priority). So the complexity would be already determined by the data structure if `enqueue` is worse than `O(log n)` (because sorting `O(n * log n)` == enqueue `n * O(log n)`). And more advanced schedulers will continuously modify the queue, so the complexity of the data structure will be multiplied to the overall scheduling complexity. — schlamar, Jun 27 '12 at 14:47
also, what is the PUT here, does the job must start at specified time ? or it is just put there to specify the order ( sequence)？ — zinking, Jun 28 '12 at 01:21
@zinking This is clearly defined: "Put a job at a specific time and move the tail backwards" — schlamar, Jun 28 '12 at 06:16

Marco de Wit · Answer 1 · 2012-07-01T10:45:23.690

class Task:
    name=''
    duration=0
    resources=list()

class Job:
    name=''
    tasks=list()

class Assignment:
    task=None
    resource=None
    time=None

class MultipleTimeline:
    assignments=list()
    def enqueue(self,job):
        pass
    def put(self,job):
        pass
    def delete(self,job):
        pass
    def recalculate(self):
        pass

Is this a first step in the direction you are looking for, i.e. a data model written out in Python?

Update:

Hereby my more efficient model:

It basicly puts all Tasks in a linked list ordered by endtime.

class Task:
    name=''
    duration=0    # the amount of work to be done
    resources=0   # bitmap that tells what resources this task uses
# the following variables are only used when the task is scheduled
    next=None     # the next scheduled task by endtime
    resource=None # the resource this task is scheduled
    gap=None      # the amount of time before the next scheduled task starts on this resource

class Job:
    id=0
    tasks=list() # the Task instances of this job in order 

class Resource:
    bitflag=0       # a bit flag which operates bitwisely with Task.resources
    firsttask=None  # the first Task instance that is scheduled on this resource
    gap=None        # the amount of time before the first Task starts

class MultipleTimeline:
    resources=list()
    def FirstFreeSlot():
            pass
    def enqueue(self,job):
        pass
    def put(self,job):
        pass
    def delete(self,job):
        pass
    def recalculate(self):
        pass

Because of the updates by enqueue and put I decided not to use trees. Because of put which moves tasks in time I decided not to use absolute times.

FirstFreeSlot not only returns the task with the free slot but also the other running tasks with their endtimes.

enqueue works as follows: We look for a free slot by FirstFreeSlot and schedule the task here. If there is enough space for the next task we can schedule it in too. If not: look at the other tasks running if they have free space. If not: run FirstFreeSlot with parameters of this time and running tasks.

improvements: if put is not used very often and enqueue is done from time zero we could keep track of the overlapping tasks by including a dict() per tasks that contains the other running tasks. Then it is also easy to keep a list() per Resource which contains the scheduled tasks with absolute time for this Resource ordered by endtime. Only those tasks are included that have bigger timegaps than before. Now we can easier find a free slot.

Questions: Do Tasks scheduled by put need to be executed at that time? If yes: What if another task to be scheduled by put overlaps? Do all resources execute a task as fast?

score 2 · Answer 2 · answered Jun 29 '12 at 01:17

2

Let's restrict ourselves to the simplest case first: Find a suitable data structure that allows for a fast implementation of FirstFreeSlot().

The free time slots live in a two-dimensional space: One dimension is the start time s, the other is the length d. FirstFreeSlot(D) effectively answers the following query:

min s: d >= D

If we think of s and d as a cartesian space (d=x, s=y), this means finding the lowest point in a subplane bounded by a vertical line. A quad-tree, perhaps with some auxiliary information in each node (namely, min s over all leafs), will help answering this query efficiently.

For Enqueue() in the face of resource constraints, consider maintaining a separate quad-tree for each resource. The quad tree can also answer queries like

min s: s >= S & d >= D

(required for restricting the start data) in a similar fashion: Now a rectangle (open at the top left) is cut off, and we look for min s in that rectangle.

Put() and Delete() are simple update operations for the quad-tree.

Recalculate() can be implemented by Delete() + Put(). In order to save time for unnecessary operations, define sufficient (or, ideally, sufficient + necessary) conditions for triggering a recalculation. The Observer pattern might help here, but remember putting the tasks for rescheduling into a FIFO queue or a priority queue sorted by start time. (You want to finish rescheduling the current task before taking over to the next.)

On a more general note, I'm sure you are aware that most kind of scheduling problems, especially those with resource constraints, are NP-complete at least. So don't expect an algorithm with a decent runtime in the general case.

answered Jun 29 '12 at 01:17

krlmlr

25,056
14
120
217

As `d` is a calculated property of `s` from the current item and `s` from the previous item I don't think this is the optimal solution. But storing the free slots additionally is definitely the way to go, so +1 for that. – schlamar Jun 29 '12 at 07:59
Why does it have to be a calculated property? My idea was that the quadtree stores the current state of the scheduling, and that every change results in an update of the quadtree. – krlmlr Jun 29 '12 at 08:48
Yes, but updating the tree is very costly, so there might be a better solution. – schlamar Jun 29 '12 at 10:28
Yes indeed. O(log N) is more costly than O(1). ;-) – krlmlr Jun 30 '12 at 19:09
Well, I don't think an update is this trivial. I think the update of a single node is the most efficient way by deleting and re-inserting it with the new data. The *average case* of delete has a complexity of `O(n log n)`. But a `put` needs to update every node where `node.s >= new_node.s`. In this case the probability to "hit" the worst case of deletion (`O(n²)`) is very high. So we have at least `O(n² log n)` or even `O(n³)` for a put operation. – schlamar Jul 02 '12 at 05:54
And, `enqueue` needs to update the free time slots, so we are getting the same complexity for this operation! – schlamar Jul 02 '12 at 06:14
Sorry, I don't see the O(n log n) complexity. – krlmlr Jul 02 '12 at 06:44
[Ref](http://www.cse.usf.edu/~ytu/teaching/CIS6930_S08/Slides/Quad-tree.ppt) for `O(n log n)`. But `O(n² log n)` is nonsense. Recreating the tree with the new data would be `O(n log n)`, too. – schlamar Jul 02 '12 at 07:24
Thanks for the ref, wasn't aware of that. In this case, you can mask deleted inner nodes and truly delete a node only if it's a leaf. And then recreate the tree every O(n) insert/delete operations to keep the O(log n) bound. – krlmlr Jul 02 '12 at 09:10
I'm awarding you the bounty because it is the best answer so far, but I am still not satisfied :) – schlamar Jul 02 '12 at 14:09
Honestly, I believe O(log n) per task operation is the best you can get. What would be your expectations? Perhaps you would like to narrow this down in a new question? -- Thanks for the bounty anyway. – krlmlr Jul 03 '12 at 07:46
The problem is still that a put/delete requires an update of every successor so we have to recreate the complete tree with `O(n log n)`. – schlamar Jul 03 '12 at 08:48
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/13340/discussion-between-user946850-and-ms4py) – krlmlr Jul 03 '12 at 08:58

score 1 · Answer 3 · edited Jun 27 '12 at 12:03

After spend some time thinking through this. I think a segment tree might be more appropriate to model this timeline queue. The job concept is like a LIST data structure.

I assume the Task can be modeled like this (PSEUDO CODE). The sequence of the tasks in the job can be assured by the start_time.

class Task:
    name=''

    _seg_starttime=-1; 
    #this is the earliest time the Task can start in the segment tree, 
    #a lot cases this can be set to -1, which indicates its start after its predecessor,
    #this is determined by its predecessor in the segment tree.
    #if this is not equal -1, then means this task is specified to start at that time
    #whenever the predecessor changed this info need to be taken care of

    _job_starttime=0; 
    #this is the earliest time the Task can start in the job sequence, constrained by job definition

    _duration=0;      
    #this is the time the Task cost to run

    def get_segstarttime():
       if _seg_starttime == -1 :
           return PREDESSOR_NODE.get_segstarttime() + _duration
       return __seg_startime + _duration

    def get_jobstarttime():
       return PREVIOUS_JOB.get_endtime()

    def get_starttime():
       return max( get_segstarttime(), get_jobstarttime() )

Enqueue it is merely append a task node into the segment tree, notice the _seg_startime set to -1 to indicate it to be started right after it's predecessor
Put insert a segment into the tree, the segment is indicated by start_time and duration.
Delete remove the segment in the tree, update its successor if necessary( say if the deleted node do have a _seg_start_time present )
Recalculate calling the get_starttime() again will directly get its earliest start time.

Examples( without considering the job constraint )

                  t1.1( _segst = 10, du = 10 )
                      \
                      t2.2( _segst = -1, du = 10 ) meaning the st=10+10=20
                        \
                        t1.3 (_segst = -1, du = 10 ) meaning the st = 20+10 = 30

if we do a Put:

                    t1.1( _segst = 10, du = 10 )
                        \
                        t2.2( _segst = -1, du = 10 ) meaning the st=20+10=30
                        /  \
t2.3(_segst = 20, du = 10)  t1.3 (_segst = -1, du = 10 ) meaning the st = 30+10 = 30

if we do a Delete t1.1 to original scenario

                      t2.2( _segst = 20, du = 10 ) 
                        \
                        t1.3 (_segst = -1, du = 10 ) meaning the st = 20+10 = 30

Each resource could be represented using 1 instance of this interval tree egg.

from the segment tree (timeline) perspective:

t1.1                      t3.1
  \                        / \
  t2.2                  t2.1  t1.2

from the job perspective:

t1.1 <- t1.2
t2.1 <- t2.2
t3.1

t2.1 and t2.2 are connected using a linked list, as stated: t2.2 get its _sg_start_time from the segment tree, get its _job_start_time from the linked list, compare the two time then the actual earliest time it could run can be derived.

+1 for the detailed explanation. But a segment tree is definitely not the right choice: " its content cannot be modified once the structure is built". And I don't think an interval tree is the optimal solution for this case either because I have no overlapping intervals on a single resource and the number of resources is small compared to the number of tasks. I think the right direction would be some binary structure per resource which are linked together. — schlamar, Jun 27 '12 at 06:17
@ms4py I agree you said its content cannot be modified once structure built. but what really matters is merely the **start_time** of a task, that is why I construct start_time from various inner structures. which essentially makes us do not need to handle the **start_time**. maybe I am not making myself well understood, I was thinking this implementation could handle the functionalites you mentioned. — zinking, Jun 27 '12 at 06:29
I updated the question with an additional point to consider. And I don't see a representation of multiple resources in your model. — schlamar, Jun 27 '12 at 06:40
@ms4py I updated my explain a little bit, the nodes between different instances of trees are connected in job via linked list. I might didn't explain this bit well. — zinking, Jun 27 '12 at 06:59
also, I might shouldn't have emphasized that this is segment tree as you said there are no much overlaps. But when we do a put, the overlap occurs and then this segment tree insert concept can be made use of. — zinking, Jun 27 '12 at 07:05
I'm still not getting the point. This looks like a binary tree for me. And the point that empty intervals need to be considered on `enqueue` is still not addressed. — schlamar, Jun 27 '12 at 07:37
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/13096/discussion-between-zinking-and-ms4py) — zinking, Jun 27 '12 at 07:44

score 0 · Accepted Answer · answered Nov 09 '12 at 07:03

0

I finally used just a simple list for my queue items and an in-memory SQLite database for storing the empty slots, because multidimensional querying and updating is very efficient with SQL. I only need to store the fields start, duration and index in a table.

answered Nov 09 '12 at 07:03

schlamar

9,238
3
38
76

:-) Just use a relation and let finding the best path to an answer to the optimizer. – Chris Wesseling Nov 13 '12 at 13:22