0

Given N input integers, and M query ranges, output the set of integers that fall within each query range (both bounds are inclusive). For example:

N = [-10, -1, -2, 0, 0, 8, 12, 16]
M = [(-100, 0), (2, 5), (13, 18)]

The answer/output would then be:

[-10, -1, -2, 0, 0]
[]
[16]

I've been trying to think of optimal solutions to this, and am struggling to come up with something that's better than O(N*M) at worst case. The approaches I've considered so far are:

Brute Force - For each query, check every digit in N and see if it falls within the range.

Presorted Input - Sort the input digits (O(NlogN)), and then do the above. Can do bounds checks for queries that are completely out of range, and remove the need to iterate (i.e. querying (0, 2) for a list like [5, 6, 7]). This still involves iterating though, and at worst case is still O(NM) (right?)

Hash Set - Add each value in the input array into a hash set, and then for each possible integer value within the query range, check if it exists in the hash set. This works for smaller ranges, but basically is the same thing -> it's O(MX) where X is the length of the query range. So if you have a query range that's huge (i.e. (-10000000, 10000000)), it wouldn't scale.

I would love to hear/learn about solutions that are better! (An alternative to this question is determining the # of digits that are within each query range, as opposed to outputting the digits).

Please let me know!

user4581301
  • 33,082
  • 7
  • 33
  • 54
jeet.m
  • 553
  • 4
  • 15

2 Answers2

1

Can be done in (2*M + N) log (2*M + N) with a sweep line algorithm

Create list of events:

  • number from N.
  • interval start from M
  • interval end from M

sort them by positions (in case of tie, you have to order depending of event: depend if your range are open or closed).

Then iterate over events:

  • if it is a start interval, add it to current opened intervals.
  • if it is a end interval, remove it from current opened intervals.
  • if it is a number from N, add it to all opened intervals.

So in your case, event list would be

{-100S -10N -2N -1N, 0N, 0N, 0E, 2S, 5E, 8N, 12N, 13S, 16N, 18E}
  • -100S: modify active interval [(-100, 0)].
  • -10N, -2N, -1N, 0N, 0N: add it to [(-100, 0)]
  • -0E: active interval: []
  • 2S: active interval: [(2, 5)]
  • 5E: active interval: []
  • 8N, 12N: Add it to no intervals.
  • 13S: active interval: [(13, 18)]
  • 16N: Add it to interval (13, 18).
  • 18E: active interval: []

Related to:

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • this is amazing, I've never heard about sweep line algorithms. One final thing, would it be possible to explain why it's O(M+N(log(M+N)), I can understand that it comes from the sort (and can see how adding the query ranges changes typical sort runtime). Is it just as simple as: since the list is now M+N, sorting it takes (M+N)log(M+N) time? – jeet.m Dec 07 '17 at 23:32
  • `(2*M+N)log(2*M+N)` comes from sorting all events and there are `2M+N` events. – Jarod42 Dec 08 '17 at 09:11
  • Great - that makes sense. Thanks! – jeet.m Dec 08 '17 at 15:37
0

I would suggest sorting the ranges rather than the input numbers, then simply checking whether any given input matches a range Even better if you can get the ranges before the list of numbers., that way you could only iterate the numbers once (as you read them).

You still have to iterate the input number list (nothing is going to help you there), but the range list should be smaller than the numbers list and so both the sort and search will be faster.

SoronelHaetir
  • 14,104
  • 1
  • 12
  • 23
  • That makes sense, sorting the ranges likely would lead to a speed optimization. It's still O(N*M) worst case, correct? – jeet.m Dec 07 '17 at 17:32