2

I have a question about finding the conflicting appointments.

Appointments: [[4,5], [2,3], [3,6], [5,7], [7,8]]
Output: 
[4,5] and [3,6] conflict. 
[3,6] and [5,7] conflict. 

I tried to solve this question by myself, but failed. I did some google, however, I am not sure the answer is correct. Would you mind sharing your ideas in Python. My current thought is to sort the interval first, but not sure what to do next. Thanks for your help.

Updates:
I worked out the inefficient approach with O(n**2), and would like to look for the answer with O(nlogn). Thanks.

Here is a test case helpful to reject your solution if it is slower than O(nlgn)

n = 10
intervals = []
for i in range(n):
    intervals.append((i, n))
print(intervals)

cpchung
  • 774
  • 3
  • 8
  • 23
Qiang Super
  • 323
  • 1
  • 11
  • Does https://stackoverflow.com/a/53936965/51685 help you? – AKX Jan 05 '21 at 14:19
  • How would you do it (even inefficiently) by hand? – 永劫回帰 Jan 05 '21 at 14:22
  • @AKX Thanks, but actually the question is a little bit different. You can solve that question by updating the start, and end. But I am not sure if it will also work here. – Qiang Super Jan 05 '21 at 14:22
  • 1
    @永劫回帰 Thanks. With O(n**2) answer, which traverse the list twice. – Qiang Super Jan 05 '21 at 14:24
  • @QiangSuper you did not tell us that you have complexity requirements, if so I don't see anything wrong with an O(n^2) answer unless you need the O(n lg n) answer. – 永劫回帰 Jan 05 '21 at 14:28
  • @QiangSuper do you consider the case for close-open interval? like [start,end) Here is a test case : ``` a=[ [9, 12], [13.5, 17], [18, 19], [13, 15], [16, 16.5], [19, 20], [0, 23], [1, 22], [0, 23] ] ``` In this case, the number of conflicts is 15 – cpchung May 19 '21 at 03:36

6 Answers6

2

Another answer with better complexity of O(n log n)

Code

appointments = [[4,5], [2,3], [3,6], [5,7], [7,8]]


appointments.sort(key= lambda x:x[0]) #appointments get sorted by start time.

for i, a in enumerate(appointments):
    for i2 in range(i+1, len(appointments)):
        b = appointments[i2]
        if a[1] > b[0]: #if start of second appointment is before end of first appointment
            print(f"{a} and {b} conflict")
        else:
            break

Output

[3, 6] and [4, 5] conflict
[3, 6] and [5, 7] conflict

Explanation

By sorting first the code becomes way more efficient. We start by checking the first ever appointment and comparing its end-time to the start time of the second appointment. If they overlap we add print the pair and keep comparing the first ever appointment to third, forth, fifth appointment until they don't overlap anymore. And because we sorted the appointments by start-time, we know that all appointments after that won't intersect.

Then we continue by comparing the second ever appointment to the appointments following it, and so on.

Complexity

The sorting is in python is O(n log n).

Because I assume that there are only very few conflicting appointments and that they are of similar length we can deduce that the loop for checking the conflicts is O(n). Only the outer loop scales with O(n) the inner loops complexity does not necessarily grow at all given that the appointment-density stays constant so O(1). Which is combined O(n). (Worst case would be O(n^2) but that only happens when every event intersects every other event which would be strange. )

But because the sorting beforehand takes O(n log n) the algorithm as a hole also runs a O(n log n).

Verification of complexity

To see how the algorithm performs we can generate time the execution for different sizes of appointment-lists. I generated datasets that should be close to real-world data using the following function:

def generate_test_data(test_size):
    appointments = []
    start_time = 0
    for i in range(test_size):
        appointments.append([start_time, start_time+choice(range(1,3))])
        start_time += choice(range(8))
    shuffle(appointments)
    return appointments

This produces lists like the example of @QiangSuper but of specific length. Using this, time the algorithms for different n_(length of input)_ and plot the runtime.

I took my algorithm and @Paddy3118s algorithm as an example:

w = []
p = []
repetitions = 10**2 #runs 100 times for higher accuracy
for test_size in range(0,1000): 
    w.append(0); p.append(0)
    for i in range(repetitions):
        a = generate_test_data(test_size*10)

        b = deepcopy(a)
        start = timer()
        conflicts_w1(b)
        end = timer()
        w[test_size] += end-start

        b = deepcopy(a)
        start = timer()
        conflicts_p1(b)
        end = timer()
        p[test_size] += end - start

    print(f"{test_size*10}, {w[test_size]/repetitions}, {p[test_size]/repetitions}")

This produces the following result:

Runtime over n

One can see that the blue line rises linearly while the orange line behaves like a quadratic function even though both implementations look very similar. But this difference can be explained.

My algorithm scales with O(n log n). But because the sort-function of python is implemented in c it's effects only become visible for way larger sets. So most of my algorithms runtime can be attributed to the for loop, which scales linearly.

For @Paddy3118s algorithm, the main difference is the use of a[i+1:]. Slices in python scale with O(n). So the for-loop in his algorithm scales with O(n^2).

If we we plot the same data on a log log diagram then we get the following:

log Runtime over log n

We can see that indeed @Paddy3118s algorithm IS FASTER as he already claimed and successfully proofed. But only for list smaller than 100 items. That's why OP asked about complexity and not about speed in the specific case.

But we live in a free world, so everyone can choose the algorithm they like best.

wuerfelfreak
  • 2,363
  • 1
  • 14
  • 29
  • Thanks. Would you mind explaining a little bit more? What about the complexity? – Qiang Super Jan 05 '21 at 18:24
  • @QiangSuper I expanded my explanation. If any questions remain feel free to ask! – wuerfelfreak Jan 05 '21 at 19:36
  • Thanks for your help! Awesome. Let me go through it. – Qiang Super Jan 05 '21 at 20:08
  • Check your comparison, or use my code below. – Paddy3118 Jan 06 '21 at 07:42
  • I hope this test case is helpful to show why the worst case running time is still `O(n^2)`: `[(0, 10), (1, 10), (2, 10), (3, 10), (4, 10), (5, 10), (6, 10), (7, 10), (8, 10), (9, 10)]` – cpchung May 20 '21 at 01:37
  • this is not a `O(nlgn)` solution. Sort takes `nlgn` but after sorting you are comparing each element after the current element with the current element. Asymptotically it is still O(n^2) @wuerfelfreak. In the test case above, it will never break out of the inner loop – cpchung May 20 '21 at 03:12
  • Yes, the worst case running time is `O(n^2)`. But I did in fact acknowledge that already in the "Complexity" section if you read carefully. I also stated that my assumption that conflicting appointments were rare, which makes for an average runtime of `O(n lg n)`. I think the average runtime is much more applicable in the real world which is why I put emphasis on `O(n lg n)` instead of the worst case running time. Hope it is clear now, have a nice day! – wuerfelfreak May 25 '21 at 08:27
1

Note: the correct comparison between items is used to give stated result. At least one of the other examples will not give the answer stated, although they do have a nice explanation.

You need to first sort the appointments. There is no need to go through the extra expense of restricting the sort to the start times as the algorithm only depends on the start time being at-or-before the end time for all appointments; as is given.

conflicts are extracted by comparing the end time of an appointment with the start time of the next appointment in the list comprehension.

Note: the correct comparison between items is used to give stated result. At least one of the other examples will not give the answer stated, although they do have a nice explanation.

You need to first sort the appointments. There is no need to go through the extra expense of restricting the sort to the start times as the algorithm only depends on the start time being at-or-before the end time for all appointments; as is given.

conflicts are extracted by comparing the end time of an appointment with the start time of the next appointments in the list comprehension until they don't clash.

a = [[4,5], [2,3], [3,6], [5,7], [7,8]]
a.sort() # No need to sort on only start time
conflicts = []
for i, this in enumerate(a):
    for next_ in a[i+1:]:
        if this[1] > next_[0]:  # this ends *after* next_ starts
            conflicts.append((this, next_))
        else:
            break  # Don't need to check any more

print(conflicts)    #  [([3, 6], [4, 5]), ([3, 6], [5, 7])]

TIMINGS

There is a discussion of algorithm efficiency so I thought I would run timings on this and @wuerfelfreaks algorithms, (with slight mods to to gather and return all conflicts as a list):

The revised code

a = [[4,5], [2,3], [3,6], [5,7], [7,8]]

def conflicts_p1(a):
    "paddy3118's code as a function returning list of tuple pairs"
    a.sort()
    conflicts = []
    for i, this in enumerate(a):
        for next_ in a[i+1:]:
            if this[1] > next_[0]:  # this ends *after* next_ starts
                conflicts.append((this, next_))
            else:
                break  # Don't need to check any more
    return conflicts

def conflicts_w1(appointments):
    "@wuerfelfreak's answer returning a list of tuple pairs"
    appointments.sort(key= lambda x:x[0]) #appointments get sorted by start time.
    conflicts = []
    for i, a in enumerate(appointments):
        for i2 in range(i+1, len(appointments)):
            b = appointments[i2]
            if a[1] > b[0]: #if start of second appointment is before end of first appointment
                conflicts.append((a, b))
            else:
                break
    return conflicts

assert conflicts_p1(a) == conflicts_w1(a)

Ipython timings

In [2]: # Paddy3118 timings

In [3]: %timeit conflicts_p1(a)
5.52 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [4]: %timeit conflicts_p1(a)
5.42 µs ± 23.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit conflicts_p1(a)
5.53 µs ± 438 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [6]: # wuerfelfreak timings

In [7]: %timeit conflicts_w1(a)
8.34 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit conflicts_w1(a)
7.95 µs ± 296 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [9]: %timeit conflicts_w1(a)
8.38 µs ± 371 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [10]: 

Summary

For the given example, my, (Paddy3118's), code is faster than that of wuerfelfreak.

Paddy3118
  • 4,704
  • 27
  • 38
  • `a[i+1:]` makes your algorithm `O(n^2)` because slices are `O(n)`. But OP asked for O(n log n) – wuerfelfreak Jan 06 '21 at 08:57
  • Worst case, yes. Your algorithm is the same @wuerfelfreak; I guess the OP needs to actaully run them on representative data to see if he needs something more. – Paddy3118 Jan 06 '21 at 09:57
  • No, in the best case. Because there is a differenc between using `range(i+1,max)` and `a[i+1:]`. `a[i+1]` is always `O(n)` while the range solution can be `O(1)`. Thats what I was trying to say. – wuerfelfreak Jan 06 '21 at 10:18
  • I only partly agree with your findings. Yes, your algorithm is indeed faster for the given example. 100% correct. But that has nothing to do with complexity and scaling. I tried to explain better what I mean in my answer. See the edit and also have a nice day! – wuerfelfreak Jan 06 '21 at 16:01
  • Noted. I'll need to look closer at your implementation. I confirmed the gist of your graphs too.Have a nice week! – Paddy3118 Jan 08 '21 at 04:43
1

Code

def attend_all_appointments(intervals):
  start, end = 0, 1
  intervals.sort(key=lambda x:x[start])
  prev_interval = intervals[0]
  # can_attend_all_appointments = True
  for i in range(1, len(intervals)):
    interval = intervals[i]
    if interval[start] < prev_interval[end]:
      print(f"{prev_interval} and {interval} conflict")
      # can_attend_all_appointments = False
    else:
      prev_interval[start] = interval[start]
      prev_interval[end] = interval[end]
  # return can_attend_all_appointments

Time Complexity: O(n log n) since we need to sort at the beginning and iterate the intervals only once. This solution doesn't need a nested loop.

  • This is dangerous as you are modifying `intervals` in the function. You may need to take a copy of `intervals` to avoid modifying the original object. – gofvonx Apr 01 '21 at 07:53
  • @gofvonx Please tell me your opinion on the negative impact of modifying intervals in the function. I don't think it's dangerous. Please consider both time complexity and space complexity. – accomplished_cap Apr 01 '21 at 09:17
  • It obviously depends on the application, but if you want to perform this check on `intervals` before further analysis of the data, this check should leave the original data intact. At the very least I would expect a statement saying that this function will modify whatever `intervals` object you feed into it. – gofvonx Apr 01 '21 at 09:20
  • Thanks for sharing. I think it is something what I am looking for. In the leetcode, I always find answer to sort an array in place. I think it should be fine. – Qiang Super Apr 12 '21 at 21:59
  • This is not answering the original problem: `find all the conflicting appointments` It can only tell whether you can attend all. One test case where it fails: `a = [ [9, 12], [13.5, 17], [18, 19], [13, 15], [16, 16.5], [19, 20], [1, 22], [0, 23] ]` where the intervals are left close right open, or `[x,y)` – cpchung May 20 '21 at 01:19
-1

O(n^2)

Code

from itertools import combinations
a = [[4,5], [2,3], [3,6], [5,7], [7,8]]

def intersects(elem): #Checks for intersections by seeing if a start/endpoint is enclosed by the start and endpoint of the other appointment
    return  (elem[0][0] < elem[1][0] < elem[0][1]) or \
            (elem[0][0] < elem[1][1] < elem[0][1]) or \
            (elem[1][0] < elem[0][0] < elem[1][1]) or \
            (elem[1][0] < elem[0][1] < elem[1][1])


conflicting = [elem for elem in combinations(a,2) if intersects(elem) ] #checks for every possible pair of events.
for conf in conflicting:
    print(f"{conf[0]} and {conf[1]} conflict")

Result:

[4, 5] and [3, 6] conflict
[3, 6] and [5, 7] conflict
wuerfelfreak
  • 2,363
  • 1
  • 14
  • 29
-1

Maybe something like this? Not really a fan of the range-based for-loop, but working with sets could make this straight-forward.

appointments = [
    [4,5],
    [2,3],
    [3,6],
    [5,7],
    [7,8]
]

sets = [set(range(*appointment)) for appointment in appointments]

for current_index in range(len(sets)):
    current_set = sets[current_index]
    for other_index, other_set in enumerate(sets[current_index+1:], start=current_index+1):
        if current_set & other_set:
            print(f"{appointments[current_index]} and {appointments[other_index]} conflict.")
Paul M.
  • 10,481
  • 2
  • 9
  • 15
-1

Maybe a little late...

from itertools import combinations

a = [[4, 5], [2, 3], [3, 6], [5, 7], [7, 8]]

for x in combinations(a, 2):
    b, c = x

    if b[0] > c[0] and b[1] < c[1] or \
            b[1] > c[0] and b[1] < c[1] or \
            b[0] < c[0] and b[1] > c[1]:
        print(f"{b} and {c} conflict")
hpelleti
  • 342
  • 1
  • 6
  • Thanks for your answer. How about the situation of adding [4,100] into the list? What is your complexity? – Qiang Super Jan 05 '21 at 18:23
  • The itertools.combinations yield the result. So there is no memory issue here. Based on the documentation, the complexity is O(n) I update the code... just realized that it was still using `for x in list(combinations(a, 2))`. Was replaced by `for x in combinations(a, 2):` – hpelleti Jan 06 '21 at 19:09
  • Thanks for your explanation. The tricky part here is to create a combination of any two interval. Then we studied these intervals. – Qiang Super Jan 07 '21 at 02:48