Find the interval that contains the largest amount of intervals and enter their number

Question

Given lots of intervals [ai, bi], find the interval that contains the most number of intervals. I easily know how to do this in O(n^2) but I must do it in O(nlogn). I think abount making two arrays, one with sorted starts and second with sorted ends, but I really don't know what to do next. I can only use structures like an array or a tuple.

for example for intervals [1, 6], [5, 6], [2, 5], [8, 9], [1, 6] corrent answer is 3, becouse [1,6] contain other [1,6] [2,5] [5,6]. At first I thought to make one array with interval ends and starts sorted. So it would look like [1,1,2,5,5,6,8,9], so I would only need to find how many intervals are between each element. But then I thought I can make two arrays with sorted starts and end: [[1,6],[1,6],[2,5],[5,6],[8,9]] and second one: [[2,5],[1,6],[1,6][5,6],[8,9]] and this is the part when I stuck. Maybe should I use binary search or something like that? — TomasITStudent, Mar 16 '22 at 16:18
These are all my thoughts, because first I want to come up with a good idea that is not O (n ^ 2) and then code — TomasITStudent, Mar 16 '22 at 16:18
Does this answer your question? [Find the maximally intersecting subset of ranges](https://stackoverflow.com/questions/15013800/find-the-maximally-intersecting-subset-of-ranges) — Dave, Mar 16 '22 at 16:43
@Dave intersecting intervals and "contained" intervals aren't the same thing imo — Abhinav Mathur, Mar 16 '22 at 16:45
Can anyone tell(mention link) me which competitive programming website this question is on so I can test my approach on it? — High On Math, Sep 05 '22 at 02:40

score 2 · Answer 1 · answered Mar 16 '22 at 16:32

Interval containment is a combination of two conditions: a includes b if a.start <= b.start and a.end >= b.end.

If you sort intervals by (end, size) and process them in that order, then the end condition is easy -- current.end >= x.end if x is an interval you've already processed.

To find out how many intervals the current interval contains, you just need to know how many previously processed intervals have start points >= current.start. Every time you process an interval, put its start point into an order statistic tree or similar. Then you can find the number of contained start points in O(log N) time, which makes your algorithm O(N log N) all together. That's the same cost as the initial sort.

Note that I've hand-waved over handling of duplicate intervals above. After sorting the interval list, you should do a pass to remove duplicates and add counts for intervals that contain copies of themselves, if that counts as containment in your problem.

score 1 · Answer 2 · answered Apr 28 '22 at 21:10

You can do it simply in O(n log n) using simply two sorts!

First, sort every interval according to its starting time. Then, sort them in reverse order of their end time.

You know a specific interval cannot contain any of the intervals placed before them neither in the first or the second array, because intervals located before them in the first array start before him, and end after him in the second array.

Now, you know that the sum of the positions of an interval in the two arrays gives a correct upper bound for the number of intervals it does not contain. It is an upper bound because an interval starting before and ending after another interval will count twice. What's more, this estimation will be exact for the interval you are searching for, because if another interval starts before and ends after it, then it will contain even more intervals. That's why the interval containing the most number of other intervals will just be the interval minimizing the sum of those two numbers.

## Warning : not tested !

intervals = [(1, 6), (5, 6), (2, 5), (8, 9), (1, 6)]
N = len(intervals)

# link intervals to their position in the array
for i, (start, end) in enumerate(intervals):
    intervals[i] = (start, end, i)

# this will contain the position in ascending order of start time of every interval
positionStart = [None] * N
sortedStart = sorted(intervals, key = lambda inter: (inter[0], -inter[1], inter[2]))
for i, (start, end, pos) in sortedStart:
    positionStart[pos] = i

# this will contain the position in descending order of end time of every interval
positionEnd = [None] * N
sortedEnd = sorted(intervals, key = lambda inter: (-inter[1], inter[0], inter[2]))
for i, (start, end, pos) in sortedEnd:
    positionEnd[pos] = i

best = None # interval containing the most number of intervals
score = 0   # number of intervals it contains
for i in range(N):
    # Upper bound of the number of interval it does not contain
    result = positionStart[i] + positionEnd[i]
    if N - result - 1 >= score:
        best = (intervals[i][0], intervals[i][1])
        score = N - result - 1

print(f"{best} is the interval containing the most number of intervals, " 
     + "containing a total of {score} intervals")

Find the interval that contains the largest amount of intervals and enter their number

2 Answers2