FInd overlapping appointments in O(n) time?

Question

I was recently asked this question in an interview. Even though I was able to come up the O(n²) solution, the interviewer was obsessed with an O(n) solution. I also checked few other solutions of O(n logn) which I understood, but O(n) solution is still not my cup of tea which assumes appointments sorted by start-time.

Can anyone explain this?

Problem Statement: You are given n appointments. Each appointment contains a start time and an end time. You have to retun all conflicting appointments efficiently.

Person: 1,2, 3, 4, 5
App Start: 2, 4, 29, 10, 22
App End: 5, 7, 34, 11, 36

Answer: 2x1 5x3

O(n logn) algorithm: separate start and end point like this:

2s, 4s, 29s, 10s, 22s, 5e, 7e, 34e, 11e, 36e

then sort all of this points (for simplicity let's assume each point is unique):

2s, 4s, 5e, 7e, 10s, 11e, 22s, 29s, 34e, 36e

if we have consecutive starts without ends then it is overlapping: 2s, 4s are adjacent so overlapping is there

We will keep a count of "s" and each time we encounter it will +1, and when e is encountered we decrease count by 1.

My gut says it is equivalent to sorting, but I do not see a proof for it yet, so I might be wrong. — amit, Sep 05 '12 at 14:21
@Rotem: O(nlogn) solution is trivial using [interval trees](http://en.wikipedia.org/wiki/Interval_tree) — amit, Sep 05 '12 at 14:21
There is an easy O(n) solution if the appointments are sorted by the start time. Otherwise, it looks like O(n log n) — Andriy, Sep 05 '12 at 14:22
@amit Thanks for the read. This area is far from my strong side. — Rotem, Sep 05 '12 at 14:23
It's impossible in `O(n)` in principle, because the *output* is not of size `O(n)`. If all `n` appointments conflict then the output has size proportional to `n^2`. [Edit: actually I've made an assumption there about the output format that may not be warranted -- that it's a list of all pairs of conflicting appointments]. — Steve Jessop, Sep 05 '12 at 14:27
Also, if the interviewer didn't tell me the `O(n)` solution then I'd probably have asked. Sometimes as soon as you hear the expected solution you realise what constraint it was that you thought applied, but actually didn't. For example, maybe the questioner uses an `O(n)` sort e.g. binary radix. — Steve Jessop, Sep 05 '12 at 14:32
@Batman: What if all appointments are equal, say, `App St = 1 1 1 1 1` and `App End = 2 2 2 2 2`? Should the output contain all n*(n-1)/2 combinations? — Andriy, Sep 05 '12 at 14:32
@SteveJessop @Andrey: Correct, but to make it interesting - what about the binary problem? (Is there *any* collision?) Can it be be done in `O(n)`? Or is `O(nlogn)` optimal (assuming unsorted data) — amit, Sep 05 '12 at 14:37
@Batman: O(log n) algorithm you just added to the question is O(nlogn) in fact. If the data are already sorted, it becomes O(n). — Andriy, Sep 05 '12 at 14:42
@Andrey Can you discuss how it is O(n) if data is sorted by start time? — Max, Sep 05 '12 at 14:44
@Batman: once the array of times is sorted, you have to iterate it once to find overlaps. Thus we get O(n). — Andriy, Sep 05 '12 at 14:46
@amit: good question, it leads me to notice that the distinctness problem reduces to this one. So to solve this in `O(n)` we require "tricks". — Steve Jessop, Sep 05 '12 at 15:12

score 18 · Accepted Answer · edited Sep 07 '17 at 00:29

The general solution to this problem is not possible in O(n).

At a minimum you need to sort by appointment start time, which requires O(n log n).

There is an O(n) solution if the list is already sorted. The algorithm basically involves checking whether the next appointment is overlapped by any previous ones. There is a bit of subtlety to this one as you actually need two pointers into the list as you run through it:

The current appointment being checked
The appointment with the latest end time encountered so far (which might not be the previous appointment)

O(n) solutions for the unsorted case could only exist if you have other constraints, e.g. a fixed number of appointment timeslots. If this was the case, then you can use HashSets to determine which appointment(s) cover each timeslot, algorithm roughly as follows:

Create a HashSet for each timeslot - O(1) since timeslot number is a fixed constant
For each appointment, store its ID number in HashSets of slot(s) that it covers - O(n) since updating a constant number of timeslots is O(1) for each appointment
Run through the slots, checking for overlaps - O(1) (or O(n) if you want to iterate over the overlapping appointments to return them as results)

You said: "There is an O(n) solution if the list is already sorted." What if all intervals intersect with each other in which case any algorithm will take O(n^2) time to print them. So, even if list is already sorted, worst case time complexity can be O(n^2). — user674669, May 04 '17 at 07:07
Hmmm my understanding was that the problem was just to print out the appointments which have conflicts , not all *pairs* of conflicting appointments. In which case it is still O(n). But I agree, if you have to output all pairs it could be O(n^2) in the worst case. Though in practice this seems unlikely given the nature of appointment data, which are likely to be relatively small chunks of time. — mikera, May 04 '17 at 11:02

score 5 · Answer 2 · answered Sep 05 '12 at 14:42

5

Assuming you have some constraint on the start and end times, and on the resolution at which you do the scheduling, it seems like it would be fairly easy to turn each appointment into a bitmap of times it does/doesn't use, then do a counting sort (aka bucket sort) on the in-use slots. Since both of those are linear, the result should be linear (though if I'm thinking correctly, it should be linear on the number of time slots rather than the number of appointments).

At least if I asked this as an interview question, the main thing I'd be hoping for is the candidate to ask about those constraints (i.e., whether those constraints are allowed). Given the degree to which it's unrealistic to schedule appointments for 1000 years from now, or schedule to a precision of even a minute (not to mention something like a nanosecond), they strike me as reasonable constraints, but you should ask before assuming them.

answered Sep 05 '12 at 14:42

Jerry Coffin

476,176
80
629
1,111

1

In other words, "Use a bucket sort or radix sort or some other O(n) cheating sort algorithm"? Not an entirely satisfying solution :-) – Nemo Sep 05 '12 at 14:47
@Nemo: Not "cheating" at all -- simply selecting the right algorithm for the job. Whether you like it or not, I'd put pretty decent odds that it's the *only* way to meet the O(N) goal. – Jerry Coffin Sep 05 '12 at 14:49
I would not take that bet, but that is still a far cry from a proof. – Nemo Sep 05 '12 at 14:57
@Nemo: No, it's not a proof in itself -- but scheduling algorithms have been studied pretty heavily, and I don't see anything unusual enough about the specs here to indicate that the normal proofs don't apply. See, for example, the "charging argument", showing the optimal scheduling is based on the earliest finishing times (which obviously requires *finding* the earliest finishing times....) – Jerry Coffin Sep 05 '12 at 15:05
2

Fortunately I have a proof: consider the subset of this problem in which all appointments are of length 1 time unit (start and end times in the question are distinctly integral, so there's a quantim of time, but even if not consider the further subset in which appointments are length k and k-aligned). Then by solving it you solve the distinctness problem: https://en.wikipedia.org/wiki/Element_distinctness_problem, which is Theta(n log n) in the "algebraic decision tree" computation model. "Algebraic decision tree" is the formal way of saying, "no O(n) sorting tricks, you swine!" – Steve Jessop Sep 05 '12 at 15:06
"swine"? How dare you use such accurate epithets in describing me! – Jerry Coffin Sep 05 '12 at 15:16
@Steve - Yup, that works, and appears to be the right answer to this question. Consider writing it up, with "...or do the sort in O(n)" as a footnote? – Nemo Sep 05 '12 at 15:34
@Nemo: thanks to this discussion, Jerry's answer is now "do the sort in `O(n)`" with the proof of why as a footnote. I think that's a correct answer. – Steve Jessop Sep 06 '12 at 08:34
@SteveJessop If I bought a computer from Dell and they told me it wouldn't hash, I'd be pretty damn pissed. – Rob Neuhaus Sep 19 '12 at 06:12
@rrenaud: true, but big-O analysis of algorithm runtime is irrelevant to any computer you buy from Dell, because that computer can only handle small inputs anyway. Your input won't be any bigger than 2^64 bytes, so who cares what happens at the limit as it tends to infinity? You often get these quibbles about fixed-width integers and array indexing, and they come down to the difference between a "clean" theory of computation and actual computers limited to small numbers. – Steve Jessop Sep 19 '12 at 09:43

score 2 · Answer 3 · answered Sep 05 '12 at 20:40

2

A naive approach might be to build two parallel trees, one ordered by the beginning point, and one ordered by the ending point of each interval. This allows discarding half of each tree in O(log n) time, but the results must be merged, requiring O(n) time. This gives us queries in O(n + log n) = O(n).

answered Sep 05 '12 at 20:40

Rami Jarrar

4,523
7
36
52

1

But wouldn't building those trees take O(n log n) time each? – templatetypedef Sep 19 '12 at 04:45

score 1 · Answer 4 · answered Sep 05 '12 at 15:47

This is the best I can think of, in horrible pseudocode. I attempted to reduce the problem as much as possible. This is only less than On^2 (I think).

Note that the output at the end will not show every appointment that a given appointment will conflict with on that appointment's specific output line...but at some point every conflict is displayed.

Also note that I renamed the appointments numerically in order of starting time.

output would be something like the following:

Appointment 1 conflicts with 2
Appointment 2 conflicts with
Appointment 3 conflicts with
Appointment 4 conflicts with 5
Appointment 5 conflicts with

appt{1},appt{2},appt{3} ,appt{4} ,appt{5}
  2      4       10       22      29
  5      7       11       36      34

pseudocode

list=(1,2,3,4,5)
for (i=1,i<=5,i++)
    list.shift()   **removes first element
    appt{i}.conflictswith()=list

for (i=1,i<=n,i++)
{   number=n
    done=false
    while(done=false)
        {if (number>i)
            {if (appt(i).endtime() < appt(number).startime())
                {appt{i}.conflictswith().pop()}
             else
                {done=true}
             number--
             }
        else
            {done=true}
        }
}
for (i=1,i<=n,i++)
    print "Appointment ",i," conflicts with:",appt{i}.conflictswith()

score 1 · Answer 5 · answered Sep 20 '12 at 19:04

1

I came across a Data Structure called Interval tree, with the help of which we can find intervals in less than O(n log (n)) time, depending upon the data provided

answered Sep 20 '12 at 19:04

Max

9,100
25
72
109

FInd overlapping appointments in O(n) time?

5 Answers5

Linked