2

I have a set of points in the plane that I want to sort based on when they encounter an arbitrary sweepline. An alternative definition is that I want to be able to sort them based on any linear combination of the x- and y-coordinates. I want to do the sorting in linear time, but am allowed to perform precomputation on the set of points in quadratic time (but preferably O(n log(n))). Is this possible? I would love a link to a paper that discusses this problem, but I could not find it myself.

For example, if I have the points (2,2), (3,0), and (0,3) I want to be able to sort them on the value of 3x+2y, and get [(0,3), (3,0), (2,2)].

Edit: In the comments below the question a helpful commenter has shown me that the naive algorithm of enumerating all possible sweeplines will give a O(n^2 log(n)) preprocessing algorithm (thanks again!). Is it possible to have a O(n log(n)) preprocessing algorithm?

timdrie
  • 23
  • 6
  • Can you clarify what information is available/what operations are allowed during the precomputation phase? Because you could just precompute by doing an O(n log n) sort of the points relative to the line, and then build a linked list of sorted results. And then the compute phase would be walking the linked list. But this is so obvious that it's almost certainly not what you meant. – Raymond Chen Jun 28 '22 at 12:43
  • 3
    Consider the ordered set of orientations of the line segments between every possible pair of points. This is equivalent to a partition of `[0, PI]` into intervals of angles. The sort order does not change as long as the sweep line orientatIon remains in the same interval. The naive preprocessing would thus take `O(n^2)`. – collapsar Jun 28 '22 at 12:46
  • @RaymondChen During the precompilation I want to precompute it for all possible lines – timdrie Jun 28 '22 at 12:46
  • @collapsar You make a great point. I thought that would result in O(n!) orientation, but many are of course impossible. I'll edit the question when I figure out how to do that, and thank you! – timdrie Jun 28 '22 at 12:53
  • Considering the actual sorting: For each of the intervals the naive sorting would still require `O(n log n)`. However, once the sorting has been determined, it henceforth can be reused for any sweep line whose orientation falls into the same interval. So if you invest `O(n^3)` in space and `O(n^2 n log n)` in preprocessing, each sort could be managed in `O(n)` (copying the sorted point set) or `O(1)` (reusing a reference to the sorted point set) plus identifying in which interval the sweep line falls (`O(log n)`). – collapsar Jun 28 '22 at 13:02
  • However, assuming points in general position, the sort order between neighboring intervals only changes for the 2 points defining one of the orientations that make up the orientation separating the intervals. Adjusting the sort order then becomes a `O(1)` operation. It might be possible to have sublinear amortized search cost if the sweepline orientation does not change randomly`. – collapsar Jun 28 '22 at 13:09
  • 1
    The number of meaningfully distinct sweepline orientations is `O(n^2)`, and any one of them could be provided during the calculation phase. This means that the calculation phase is an interval search algorithm over `O(n^2)` intervals. If your preprocessing phase is smaller than `O(n^2)`, then `O(n^2)` of the intervals will go unpreprocessed, which means that the calculation phase will have to sift through `O(n^2)` unsorted intervals, which is going to be hard to bring down to `O(n)`. – Raymond Chen Jun 28 '22 at 13:17
  • I thought I would just save `O(n)` of the lists uniformly distributed over all of the existing sweepline angles, and save the two points that are switched by every other sweepline. Then, during actual search I look for the closest angle and start swapping two elements for every sweepline I come across. This should result in `O(n^2 log(n))` preprocessing and `O(n)` search cost, and use `O(n^2)` space. – timdrie Jun 28 '22 at 13:18
  • 1
    Yeah, if you can find a way to divide the intervals into `n` group faster than `O(n^2)`, then you can have the preprocessing phase produce `n` groups (each of which contains `n` intervals). The calculation would be to find the correct group, then `O(n)` processing over the group to find the right interval. But that "divide into `n` groups faster than `O(n^2)`" is the sticky part since there are `O(n^2)` intervals in the first place. – Raymond Chen Jun 28 '22 at 13:21
  • So I would do the following: 1. Sort all `O(n^2)` interesting sweepline angles 2. For angles, `#0, #n, #2n, ...` calculate the entire sorted list 3. For all angles, calculate the index in the list of the first element that will be swapped by this angle Then, during search: 1. Calculate the previous angle with an index a multiple of `n` 2. Take that list 3. Include all of the relevant changes calculated in step 3 of the preprocessing For `O(n^2 log n)` time preprocessing and `O(n)` time search I think – timdrie Jun 28 '22 at 13:28
  • Maybe I'm over simplifying this, but isn't this just the Point to Line distance formula calculated for each point and then sorted? If you use radix sort on the distances it's O(n) time. – Louis Ricci Jun 28 '22 at 17:03
  • @RaymondChen If we're interested in average time, instead of worst case time, we can do it randomly. Please see https://stackoverflow.com/a/72790549/585411 for `O(n)` average runtime with `O(n^2)` space and average `O(n^2 log*(n))` processing time. – btilly Jun 28 '22 at 17:05
  • @LouisRicci The distances are floating point and therefore not directly good candidates for radix sort. You'd need to find something clever to make that idea work. – btilly Jun 28 '22 at 20:18
  • @btilly - Radix Sort can work just fine on 32bit floating point numbers. Especially numbers that will always be positive like distances. The larger the FP number the larger the IEEE-754 binary representation is: 0.01 = 0x3C23D70A, 0.02 = 0x3CA3D70A, 1.00 = 0x3F800000, 1.01 = 0x3F8147AE, 9999 = 0x461C3C00 – Louis Ricci Jun 29 '22 at 11:42
  • @LouisRicci The problem with radix sort is that you have to search the range. With 64 bit doubles, which are usually used, that range is rather large. If you have few enough points to avoid collisions with 32 bit, just sorting is probably still faster. – btilly Jun 29 '22 at 14:49
  • 1
    @btilly - Radix Sort for 64bit values is fast as well 9*N time and 2*N space. https://stackoverflow.com/questions/32188370/fastest-sort-algorithm-for-millions-of-uint64-rgbz-graphics-pixels https://stackoverflow.com/questions/47080353/radix-sort-c-code-to-sort-uint64-t-looking-only-at-32-msb-bits – Louis Ricci Jun 29 '22 at 18:08
  • @LouisRicci OK, the optimized multi-level radix sort is clever. I certainly couldn't have come up with it. I think I came up with a faster approach, but it would have to be coded to see. Fewer passes, but more local variables and more conditionals. – btilly Jun 29 '22 at 19:56

1 Answers1

3

First note, that enumerating all of the sweeplines takes O(n^2 log(n)), but then you have to sort the n^2 sweeplines. Doing that naively will take time O(n^3 log(n)) and space O(n^3).

I think I can get average performance down to O(n) with O(n^2 log*(n)) time and O(n^2) space spent on preprocessing. (Here log* is the iterated logarithm and for all intents and purposes it is a constant.) But this is only average performance, not worst case.

The first thing to note is that there are n choose 2 = n*(n-1)/2 pairs of points. As we rotate 360 degrees, each pair will cross the other twice, for at most O(n^2) different orderings and O(n^2) pair crossings between them. Also note that after a pair crosses, it does not cross again for 180 degrees. Over any range of less than 180 degrees, a given pair either will cross once or won't.

Now the idea is that we'll store a random O(n) of those possible orderings and which sweepline they correspond to. Between any sweepline and the next, we'll see O(n^2 / n) = O(n) pairs of points cross. Therefore both sorts are correct to on average O(1), and every inversion between the first and the order we want is an inversion between the first and second sorts. We'll use this to find our final sort in O(n).

Let me fill in details backwards.

We have our O(n) sweeplines precalculated. In time O(log(n)) we find the two nearest. Let's assume we find the following data structures.

  • pos1: Lookup from point to its position in sweepline 1.
  • points1: Lookup from position to the point there in sweepline 1.
  • points2: Lookup from position to the point there in sweepline 2.

We will now try to sort in time O(n).

We initialize the following data structures:

  • upcoming: Priority queue of points that could be next.
  • is_seen: Bitmap from position to whether we've added the point to upcoming.
  • answer: A vector/array/whatever you language calls it that will hold the answer at the end.
  • max_j: The farthest point in line 2 that we have added to upcoming. Starts at -1.

And now we do the following.

for i in range(n):
    while is_seen[i] == 0:
        # Find another possible point
        max_j++
        point = points2[max_j]
        upcoming.add(point with where it is encountered as priority)
        is_seen[pos1[point]] = 1

    # upcoming has points1[i] and every point that can come before it.
    answer.append(upcoming.pop())

Waving my hands vigorously, every point is put into upcoming once, and taken out once. On average, upcoming has O(1) points in it, so all operations average out to O(1). Since there are n points, the total time is O(n).

OK, how do we set up our sweeplines? Since we only care about average performance, we cheat. We randomly choose O(n) pairs of points. Each pair of points defines a sweepline. We sort those sweeplines in O(n log(n)).

Now we have to sort O(n) sweeplines. How do we do this?

Well we can sort a fixed number of them by any method we want. Let's pick 4 evenly chosen sweeplines and do that. (We actually only need to do the calculation 2x. We pick 2 pairs of points. We pick the sweepline where the first 2 cross, then the second 2 cross, then the other 2 sweeplines are at 180 degrees from the first 2, and therefore are just reversed order.) After that, we can use the algorithm above to sort a sweepline between 2 others. And do that through bisection to smaller and smaller intervals.

Now, of course, the sweeplines will not be as close as they were above. But let's note that if we expect the points to agree to within an average O(f(n)) places between the sweepline, then the heap will have O(f(n)) elements in it, and operations on it will take O(log(f(n))) time, and so we get the intermediate sweepline in O(n log(f(n)). How long is the whole calculation?

Well, we have kind of a tree of calculations to do. Let's divide the sweeplines by what level they are, the group them. The grouping will be the top:

1              .. n/log(n)
n/log(n)       .. n/log(log(n))
n/log(log(n))  .. n/log(log(log(n)))

...and so on.

In each group we have O(n / log^k(n)) sweeplines to calculate. Each sweepline takes O(n log^k(n)) time to calculate. Therefore each level takes O(n^2). The number of levels is the iterated logarithm, log*(n). So total preprocessing time is O(n^2 log*(n)).

btilly
  • 43,296
  • 3
  • 59
  • 88