7

I need to implement an algorithm for my managing application that will tell me when and to which user a task can be assigned.

I implemented a brute force solution, it seems to work, but I'd like to know if there is more efficient way to do this. For sake of simplicity I've rewritten the algorithm to operate on lists of numbers(instead of db queries and such). Below i will try to explain my way of thinking.

Let's say we have 3 users that can be potentially assigned to the task.

user_a_busy = [[1,2], [2,4], [5,6]]
user_b_busy = [[4,7], [7,8]]
user_c_busy = [[0,1], [1,5]]

Each element of the list represents a period in which user is not available during the day. So User A is busy between 1 AM and 2 AM, 2 AM and 4 AM and so on. To make it possible to iterate over users and identify them i represent the above lists in form of dictionary.

users_to_check = {'A':user_a_busy, 'B':user_b_busy, 'C':user_c_busy}

Now let's say we have a task that takes 1 hour to complete and we want to check a period between midnight and 10 AM in 1 hour intervals(so tasks can start only on whole hours). Here is a representation of every period to check in form of a list.

task_intervals_to_check = [[0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9], [9, 10]]

Here is a function that checks if two intervals overlap:

def intervals_overlap(service, busy):
    if service[1] > busy[0] and service[0] < busy[1]:
        return True
    return False

So now here is the loop that results in dictionary of available hours and users that can be assigned to the task:

result = defaultdict(list)
for interval in task_intervals_to_check:
    for user, user_busy in users_to_check.iteritems():
        overlaps = False
        for busy_period in user_busy:
            if intervals_overlap(interval, busy_period):
                overlaps = True
                break
        if not overlaps:
            result[interval[0]].append(user)

For task of length of 1 hour the result is:

{0: ['A', 'B'], 1: ['B'], 2: ['B'], 3: ['B'], 4: ['A'], 5: ['C'], 6: ['A', 'C'], 7: ['A', 'C'], 8: ['A', 'C', 'B'], 9: ['A', 'C', 'B']}

For task of length of 2 hours the result is:

{0: ['B'], 1: ['B'], 2: ['B'], 5: ['C'], 6: ['A', 'C'], 7: ['A', 'C'], 8: ['A', 'C', 'B']}

Which is an expected result. Below is the diagram which helped me to find correct results:

enter image description here So now my question is, is there any way to optimise this solution? Is this an acceptable solution?

Kesto
  • 625
  • 1
  • 8
  • 17
  • 1
    This is a combinatorial problem. Your code may give you decent answers on small examples like this but may give very inefficient results for more complex problems. Have you tried it at scale? Are the results acceptable? If not you might need to look into fast heuristic methods like simulated annealing. – roganjosh Feb 27 '17 at 21:07
  • As @roganjoshsaid, this is a math/combinatorial optimization problem, not strictly a Python problem. – Elmex80s Feb 27 '17 at 21:09
  • 1
    I disagree with previous comments that this is some kind of NP-hard optimisation problem. As OP described his problem we need only find to **what user** and at **what time** a single task may be assigned provided we already have their fixed schedule, don't we? – Ivan Gritsenko Feb 27 '17 at 21:13
  • @IvanGritsenko How is this not a combinatorial problem? The selection of one staff member to do a job may make an acceptable solution impossible based on the free time of other staff members. If you boil it down; there could be a tie-break between multiple staff and only one (if any) leads to a solution in which all jobs can be done; a greedy approach won't address that. – roganjosh Feb 27 '17 at 21:14
  • @roganjosh, as OP described he does not have a number of tasks and a number of users. He doesn't need to assign every task to some user. What he described is how can he simply find ways of assigning a **single** task, didn't he? *"I need to implement an algorithm for my managing application that will tell me when and to which user a task can be assigned."* – Ivan Gritsenko Feb 27 '17 at 21:19
  • @IvanGritsenko on a purely semantic level, I will concede, fair enough. On any kind of real level, this must surely be a variation of a timetabling problem. – roganjosh Feb 27 '17 at 21:22
  • To explain what is the use case. User with task assigning privilege choses a task, then choses a date and then is given a list of hours when a task can be assigned. When the hour is chosen, system will randomly pick a user(out of available ones) and assign the task to him. Thank you all for suggestions I will look into them and try to figure it out. – Kesto Feb 27 '17 at 21:35
  • Then perhaps the viewpoint of @IvanGritsenko is more relevant for this. I jumped ahead to a global optimisation of this, so my suggestions are only valid if you wanted a computer to try and solve this kind of thing for maximum efficiency. I'm blinkered by my daily grind, sorry. – roganjosh Feb 27 '17 at 21:41
  • I have a theory that this could be done pretty efficiently using a simple 2d matrix - I'll ponder on it a bit and come up with an answer. – OneNeptune Feb 27 '17 at 23:07

3 Answers3

1

You can try and get rid of the outermost loop. Assuming you have the beginning and end of the period to check in ps, pe (0 and 10 in the example) and the task duration in task_duration (1 or 2 in the example). And assuming everything is in units of full hours and that the busy_intervals are sorted by time.

result = defaultdict(list)
for user, user_busy in users_to_check.iteritems():
    for l_busy_per,r_busy_per in zip([[0, ps]] + user_busy, user_busy + [[pe, 0]]):
        avail_start = l_busy_per[1]
        avail_end = r_busy_per[0]
        for hit in range(avail_start, avail_end+1-task_duration):
            result[hit].append(user)
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
1

I want to add a little bit on the representation of the problem. I think a representation with only the start time is both sufficient and more natural. If user a is busy from 0-1, 2-4 and 5-6 I would recommend such representation:

a_busy = (0, 2, 3, 5)

This means user a is busy for one unit of time for each time in a_busy. Also, the time slots to assign are represented more naturally that way, too.

task_times = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Then we could even use basic set theory to derive a solution for each user. Let user_busy be the set of starting times, where the user can not be assigned given a length. Furthermore let slots_to_fill be the starting times of the slots, which are desired to be filled by users, given a length. Then the difference of slots_to_fill and user_busy is the optimal assignment for the user. Here is an example for length = 2 and your user a:

user_busy = set([0, 1, 2, 3, 4, 5]) # Set where user cannot be assigned
slots_to_fill = set([0, 1, 2, 3, 4, 5, 6, 7, 8]) # Set where users shall be assigned
x = slots_to_fill - user_busy
print(x) # {6, 7, 8}

The most difficult aspect of this solution is to build the sets from your data. In this natural representation to the problem the solution is trivial and can be decomposed to be done on a per user basis:

from itertools import chain

user_busy = [[1,2], [2,4], [5,6]]
task_intervals_to_check = [[0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9], [9, 10]]
length = 2

# Convert original data to tuples of starting times
busy_start_time = tuple(chain.from_iterable(range(i, j) for i, j in user_busy))
slots_to_fill = tuple(chain.from_iterable(range(i, j) for i, j in task_intervals_to_check))

def assign(fillslots, not_avail, length):
    return filter(lambda x: all(x+i not in not_avail for i in range(length)) and x+length-1 <= max(fillslots), fillslots)

times = assign(slots_to_fill, busy_start_time, length)
print(list(times))

This returns a list of starting times, where the user could be assigned, which are more convenient to handle than the lists. The end time can be computed by adding the length of the assignment interval to the start time.

Finally, I do not think there is much benefit in optimization regarding the run time, since this problem is rather cheap computationally. If you want to optimize the solution quality, you will first have to define an objective. E. g., this could be something like: Minimize the total number of assignments while filling all time slots. This would not increase the difficulty of the problem much though. Constraints which correlate users would make the problem significantly harder, e. g. user A and user B must not be assigned within a two hour period and user C can only be assigned if user B is also assigned.

Tristan
  • 1,576
  • 9
  • 12
0

So, I think as you scale, you'll end up implementing more advanced formulas, and have made a simple integration now. I suspect, you'll be best off working with the schedule as a matrix.

My solution was made in Ruby - but the concept applies in other languages.

This will allow you find individual blocks of free time, but selecting hours 2-4 you'd get something like this for an at a glance rendering:

[ 1 , 1 , 1 ], 
[ 1 , 1 , 1 ], 
[ 1 , 1 , 1 ], 

Which can come in handy later for more advanced searches and implementing algorithms. For this simple solution, I'll demonstrate with the following.

calendar = [ # 0 is free, 1 is busy
  [ 1 , 1 , 1 ], #12AM to
  [ 1 , 1 , 1 ], #1AM to
  [ 1 , 1 , 1 ], #2AM to
  [ 1 , 1 , 1 ], #3AM to
  [ 1 , 1 , 1 ], #4AM to
  [ 1 , 1 , 0 ], #5AM to
  [ 1 , 1 , 0 ], #6AM to
  [ 1 , 1 , 0 ], #7AM to
  [ 1 , 1 , 0 ], #8AM to
  [ 0 , 1 , 1 ], #9AM to
  [ 0 , 1 , 1 ], #10AM to
  [ 1 , 1 , 1 ], #11AM to
  [ 1 , 1 , 1 ], #12PM to
  [ 1 , 0 , 1 ], #1PM to
  [ 1 , 0 , 1 ], #2PM to
  [ 1 , 0 , 1 ], #3PM to
  [ 1 , 1 , 0 ], #4PM to
  [ 1 , 1 , 0 ], #5PM to
  [ 1 , 1 , 1 ], #6PM to
  [ 1 , 1 , 1 ], #7PM to
  [ 1 , 1 , 1 ], #8PM to
  [ 1 , 1 , 1 ], #9PM to
  [ 1 , 1 , 1 ], #10PM to
  [ 1 , 1 , 1 ], #11PM to
  ["A","B","C"] #Users
  ]


def find_available_slot(length, calendar)
  [].tap do |results|
    calendar.transpose.collect do |schedule|
      times = schedule[0...-1]
      blocks = sort_it(times.each_index.select {|i| times[i] == 0 }).select { |x| x.count >= length }
      results << [blocks, schedule.last] unless blocks.empty?
    end
    results
  end
end

def sort_it(arr)
  tmp, main = [], []
  arr.each_with_index do |x, i|
    if arr[i-1]
      if arr[i-1] + 1 == x
        tmp << x
      else
        main << tmp unless tmp.empty?
        tmp = [x]
      end
    else
      tmp << x
    end
  end
  main << tmp
  main
end

find_available_slot(2, calendar)

For my example schedule in place, looking for blocks with 2 hours available, it returns the following results:

=> [[[[9, 10]], "A"], [[[13, 14, 15]], "B"], [[[5, 6, 7, 8], [16, 17]], "C"]]

So the result return a nested array, and each element of the array is the blocks for those users, if any. So result[0] would be the first user available result[0][0] would be the blocks and result[0][1] would tell you which user.

This sort of scheduling matrix is very powerful longterm, and I would suggest any implementations you did utilized 2d like this.

I did a brief google search and you can read more here:

Appointment scheduling algorithm (N people with N free-busy slots, constraint-satisfaction)

FInd overlapping appointments in O(n) time?

http://www.geeksforgeeks.org/given-n-appointments-find-conflicting-appointments/

Community
  • 1
  • 1
OneNeptune
  • 883
  • 11
  • 20