5

I'm trying to understand the algorithm that can be used to calculate the area of the union of a set of axis aligned rectangles.

The solution that I'm following is here : http://tryalgo.org/en/geometry/2016/06/25/union-of-rectangles/

The part I don't understand is :

The segment tree is the right choice for this data structure. It has complexity O(logn) for the update operations and O(1) for the query. We need to augment the segment tree with a score per node, with the following properties.

  • every node corresponds to a y-interval being the union of the elementary y-intervals over all the indices in the span of the node.
  • if the node value is zero, the score is the sum of the scores of the descendants (or 0 if the node is a leaf).
  • if the node value is positive, the score is the length of the y-interval corresponding to the node.

How do we achieve this in O(n log n) ?

My idea was to create a segment tree, and update each range's value as and when we encounter the range(y range as the height of the rectangle) while line sweeping. And then for for each interval(two consecutive elements in the sorted x array, multiple Δx by the total length of the y range active in this interval, by looking at the sum of all elements in the segment tree)

This would still leads us to having max(y) - min(y) elements in the segment tree's base.

Hence, I'm not sure how this is O(n log n) - where n is the number of rectangles.

Would greatly appreciate any help here.

Thanks!

Hormigas
  • 1,429
  • 5
  • 24
  • 45
  • We can use coordinate compression to only keep necessary coordinates so there's only O(n) elements in the segment tree base – Photon Apr 16 '19 at 07:17
  • @Photon could you explain more ? Or any recommended link where I could read up about this? – Hormigas Apr 16 '19 at 15:26
  • The text says "It has complexity O(log n) for the update operations and O(1) for the query." As the tree contains at most n nodes and is updated at most n times, you have the worst-case O(n log n). In practice, O(n log m) where m is some average number of nodes in the tree, which could be like √n (so the difference is no so significant as log√n=1/2 log n). –  Apr 18 '19 at 08:36

2 Answers2

5

Let's consider some easy case: enter image description here

According to your understanding you would create segment tree with 11 - 1 = 10 nodes at base, so something like this:

enter image description here

Notice we have only 9 nodes in base, because first node is for interval [1,2], next one for interval [2,3] and so on

And when you enter some rectangle, you would update it's range based on its y coordinates, so after meeting first one on x=0, your segment tree would look like this:

enter image description here

We would also need to use something called lazy propagation to update active intervals on the tree, so all active intervals would contribute 1 to the sum.

So complexity of your current approach is something like O(K log K) where K = max(y)-min(y)

We can easilly reduce this to O(n log n) where n is number of rectangles.

Notice that only important y coordinates are those that exist, so in this example 1,3,6,11

Also notice that there's at most 2*n such coordinates

So we can map all coordinates to some integers so they fit better in segment tree.

This is known as coordinate compression it can be done with something like this:

  1. Store all y coordinates in array
  2. sort array and remove duplicates
  3. use map or hashMap to map original coordinates to it's position in sorted array

So in our example it would be:

  1. [1,3,6,11]
  2. no duplicates to remove so array still [1,3,6,11]
  3. mp[1]=1, mp[3]=2, mp[6]=3, mp[11]=4

So now algorithm stays the same, yet we can use segment tree with only at most 2*n nodes in it's base.

Also we would need to modify our segment tree a little, instead of keeping which y coordinates are on or off we will now keep which intervals of y coordinates are on/off

So we will have nodes for intervals [y0,y1],[y1,y2], ... for all unique sorted values of y.

Also all nodes will contribute y[i]-y[i-1] to the sum (if they are in range and active) instead of one.

So our new segment tree would be something like this:

enter image description here

Photon
  • 2,717
  • 1
  • 18
  • 22
  • This might be a naive question - but in the very first diagram that you have, is the value at each node the maximum of its children ? Thanks – Hormigas Apr 16 '19 at 20:24
  • I'm not sure I understand how we get the second figure from the first one after we encounter the first rectangle. The y range being updated should be [3,6], isn't it ? – Hormigas Apr 16 '19 at 20:26
  • To clarify, could you please specify what the value inside each node means ? I think that would help me understand how we get figure 2 from figure 1 better. – Hormigas Apr 17 '19 at 01:31
  • @Anant i edited images & added more explanations i hope it makes concept clearer – Photon Apr 17 '19 at 05:12
  • Appreciate your response. I'm still not certain about "Notice we have only 9 nodes in base, because first node is for interval [1,2], next one for interval [2,3] and so on" Whatever I've read of Segment Trees from :https://www.hackerearth.com/practice/data-structures/advanced-data-structures/segment-trees/tutorial/ is that the base nodes are for single elements. So for intervals [0,0] , [1,1] and so on. However, your answer helped me understand coordinate compression. Many thanks! Would love it if you could clarify on the above doubt. If not, I understand and still appreciate your effort – Hormigas Apr 17 '19 at 05:23
  • @Anant yes base segment tree nodes are for single intervals [0,0], [1,1] for some array A[ ], but in this problem we just modify what they mean, so A[0] will mean interval [1,2], A[1] will mean interval [2,3] and so on. – Photon Apr 17 '19 at 05:33
  • Here is my code implementing this efficiently: https://ideone.com/jPe1n5. Problem reference: https://lightoj.com/problem/rectangle-union. – Robbinb1993 Jan 26 '22 at 19:33
0

How do we achieve this in O(n log n) ?

Consider for every rectangle, you are going to update the range [x, y] in a binary segment tree. Essentially what happens is, you are

  • searching for x as the left boundary in the tree (log(N) time)
  • searching for y as the right boundary in the tree (log(N) time)

Assume the node you found for x is [x,a], and it has a parent node of [z,a] and this parent node has a sibling [a,b]. Apparently if y is not under [a,b], the entire span of [a,b] is covered, so you increment this node instead of all individual segment nodes under [a,b] subtree.

As a result, the searching/updating process looks like this.

  • If a parent node [a,c] has two children [a,b], [b,c] and the left boundary x is in [a,b], decide whether or not to increment value in node [b,c] (depending on whether y is greater than c)
  • If a parent node [a,c] has two children [a,b], [b,c] and the right boundary y is in [b,c], decide whether or not to increment value in node [a,b] (depending on whether x is less than a)

Essentially, before diving into one node, decide whether or not to update its sibling.

Number of nodes you decide whether or not to update is log(N) (for x) + log(N) (for y).

dchneric
  • 121
  • 5