Priority Search Tree confusion

Question

The only reasonable slide set I found is this, which in page 15 says, for building:

Sort all points by their x coordinate value and store them in the leaf nodes of a balanced binary tree (i.e., a range tree)

Starting at the root, each node contains the point in its subtree with the maximum value for its y coordinate that has not been stored
at a shallower depth in the tree; if no such node exists, then node
is empty

I implemented a Range Tree just before, so based on the example I used there, I now have (for a Priority Search Tree):

                             7
                      ------ 0 ---------------------
                      |                            |
                      6                            6
                ---- -3 ----                  ---- 4 ------ 
                |          |                  |           |
                4          2                  1           3
          ---- -4 -       -2              --- 3 ---       5
          |       |       / \             |       |      / \
          0    (-3,4) (-2,2)(0,7)        NIL   (4,-4) (5,3)(6,-1)
         -5                               2    
         / \                             / \
    (-5,6) (-4,0)                    (2,1) (3,6)

Here, every inner node is of the form:

mid_x
max y

and the range query I am posing is (-inf, 1) x (-0.5, 5), i.e. (x_low, x_high) x (y_low, y_high).

So, let's start from the root, I check if x (=0) is in (-inf, 1) and if 7 > (-0.5, 5). It is, thus I proceed in both children, but for simplicity, let me follow the left one (in all cases from now).
I check if -3 is the x range and if 6 is more or equal than the upper bound of the y range of the query (i.e. 5). It is, so I continue.
Same for the next level, thus we go to the next level and now please focus on this inner node. It has as max y a 0, which indicates that the max y of the subtree is 0, which is incorrect (left child is (-5, 6))^*.

What am I missing in building/searching process?

In other words:

Check the leftmost branch of the tree; it has as max_y values (2nd bullet in the quote), 7, 6, 4, 0.

But isn't that value the one that indicated the maximum y value stored in the subtree under the inner node? If so, this does not hold for 0 and point (-5, 6) (6 is not used, since its used in the 2nd level).

*_{The particular query I am posing might not be damaged by that, but another one can.}

You've got enough reputation to know better... http://stackoverflow.com/help/mcve People make assumptions about what they're doing that are wrong and then don't post the details that are actually relevant because they assume they're doing that part right. — xaxxon, May 14 '16 at 04:06
@xaxxon this has to do with the algorithm, not on how I am how to convert pseudocode to code! — gsamaras, May 14 '16 at 04:07
"I am using C++, if that matters.", it's in the question @xaxxon. Good point though, I will remove it! — gsamaras, May 14 '16 at 04:11
The inner nodes hold the mid value of x (the one the binary tree holds) and the maximum value of y that lie (?) under its subtree. @xaxxon — gsamaras, May 14 '16 at 04:14
I don't understand "each node contains the point in its subtree" what is a "point in a subtree" and how do you contain it? — xaxxon, May 14 '16 at 04:22
"I check if x (=0) is in (-inf, 1) and if 7 < (-0.5, 5). It is, thus I proceed in both children" 0 is not inside -inf.. -0.5 — xaxxon, May 14 '16 at 04:23
"the maximum value of y that lie (?) under its subtree" -- then shouldn't the 4 -4 node be 6 -4? Sorry, that's all I got for now. — xaxxon, May 14 '16 at 04:25
@xaxxon thanks for trying to help! 1) If you read the whole sentence, it says that the node contains a y value, that is the max value appeared in the subtree. For example, the root has 7, since all points that are stored in its subtree (the whole tree since it's the root) have at max 7 as their y coordinate. 2) As I have clarified in my question, `(-inf, 1)` is the **x** range we are interested in, more than `-inf` and less than `1` in that case. 3) The quote from the slides say that the value should have not been used elsewhere, and `6` is used in the previous level, that's what confuses me!! — gsamaras, May 14 '16 at 04:30

ShuberFu · Accepted Answer · 2016-05-14T05:35:25.297

You last logic is actually still correct. The (-5,6) value should've already been picked up when you hit the node you labelled (6,-3). Now, I'm no math major. But the general idea is this. In the Priority Search tree as you implemented, you're actually searching on two separate criteria. For x, it's a simple binary search for the range. For y, you're actually searching for it as a priority tree (good for search of y:inf or -inf:y, depending on your whether you use max or min.)

Note that at the bottom of page 15, it states that the tree is good for a semi-infinite range (one is infinite). Keep reading down, you'll see how the tree is optimized for semi-infinite range for y.

In short, since your tree is constructed with x as the binary search and y as a priority (using max remaining value), the optimal search is [x1:x2],[y1:inf].

Each node in the tree essentially stores 2 things. 1. The mid-point of x (or the max-x in the left tree, and the decision to traverse left or right is based on the >x check). 2. The POINT with the highest y value in the subtree that had not been added to previous one.

The search algorithm essentially goes like this. Starting from the root using a criteria of [x1:x2], [y1:inf] 1. Check the y-value of the POINT assigned to the node. If y > y1, go to 2, otherwise STOP traversing down (since the POINT assigned to the node has the highest y value, if the check failed, there's no other node beneath it that can fulfill [y1:inf]. 2. Check if the x-value of the point is in range of [x1:x2]. If so, include that point in the output. Continue to step 3 regardless if you included that point. 3. Check the node's "mid-point" value (let's call that mx). If mx is in range of [x1:x2], traverse down both path. If mx is < [x1:x2], traverse left. Otherwise traverse right. Go back to step 1 for each path you traverse.

EDIT for very, VERY long text: Let's run through your example. I've made an additional annotation marking each point using letter (the point's "name"). Each node now have the format of name of the point, with it's y-value in the parenthsis, and the "mid-range" x below it. For the leaf nodes, those with an '*' means they are already assigned to somewhere up the tree, and should be treated as NIL if you ever reach them.

                              7(E)
                       ------ 0 ---------------------
                       |                            |
                      A(6)                         G(6)
                 ----- -3 -----                  ---- 4 -------- 
                 |            |                  |             |
                 C(4)        D(2)               F(1)          I(3)
           ---- -4 -         -2              --- 3 ---         5
           |       |         / \             |       |        / \
           B(0)  C*(-3,4)D*(-2,2)E*(0,7)     NIL   H(4,-4) I*(5,3)J(6,-1)
          -5                                 2   
          / \                               / \
    A*(-5,6)B*(-4,0)                   F*(2,1) G*(3,6)

Let's run an example query of [-2:4] [1:inf](or any y >= 1)

Starting from root, we see point E.
Is the y of point E (7) >= 1? Yes.
Is the x of point E (0) in [-2:4]? Yes
Add E to output.
Is the mid-point (also 0) in range? Yes
From the node with E, need to traverse both side.
First, let's go left, we see point A.
Is the y of point A(6) >= 1? Yes.
Is the x of point A(-5) in [-2:4]? No.
Skip A, but continue traverse (only the y check stops traversion).
Is the mid-point at A in range of [-2:4]? No, it's to the left.
Since -3 < [-2:4], we only need to traverse right. Now we see point D.
Is the y of point D(2) >= 1? No! Skip the point and stop traversing down, since there's no other point under D that we have NOT outputted (note, even if E is below D, E is already outputted when we visited root at the beginning).
I traverse up to A, since we don't need to traverse the left path, keep going up.
Now I'm back at root, which needs to have both traversed. Since we just went left, we're going right. There, we see point G
Is the y of point G (6) >= 1? Yes
Is the x of point G (3) in [-2:4]? Yes
Add G to output, now we have (E,G).
Is the mid-point at G in range? Yes, we'll have to traverse both sides.
Let's go left first. We see F.
Is the y of point F (1) >= 1? Yes
Is the x of point F (2) in [-2:4]? Yes
Add F to output, now we have (E,F,G)
Is the mid-point at F in [-2:4]? Yes, we'll have to traverse both sides.
Let's go left first again, we see... NIL. There's no more points to be added below (since we already checked/added F,G before).
Let's go back up to F and travel down the right side, we see H.
Is the y of point H (-4) >= 1? No. Don't add the point and stop traversing.
We go back up to F, which already has both path traversed.
We go back up to G, which still needs its right path traverse.
We traverse down rigth path, and see I.
Is the y of point I (3) >= 1? Yes
Is the x of point I (5) in [-2:4]? No.
Skip F.
Is the mid-point at I in range of [-2,4]? No, it's greater.
Since it's greater, we only need to traverse the left side, so let's do that.
Traverse down left side we see... I, again. Since we already seen I, and it's a leaf node, we stop traversing and go back up to I.
I is done (don't need to traverse right). Go back up to G.
G is done (both sides traversed). GO back up to E.
E is done (both sides traversed). Since E is root node, we're now done.

The result is "E,F,G" are in range.

Why you use a query like `[x1:x2], [y1:inf]`? I understand that it might be faster, but is equivalent to the one I asked for? — gsamaras, May 14 '16 at 05:04
The fundamental different is that your semi-infinite range is on the x value. The tree you constructed is optimized for a semi-infinite range on the y value. In another word, the tree you constructed is optimized to find x within range and reject y below a certain value. — ShuberFu, May 14 '16 at 05:06
So is the tree building I posted correct Sterling? I am not interested in any optimization(s) at this point. — gsamaras, May 14 '16 at 05:22
Yes, the tree you're building is correct. The key is how you traverse it. A correct tree doesn't work if you don't traverse it correctly. I've added a pretty long example detailing the traverse logic. Hope that helps. — ShuberFu, May 14 '16 at 05:36
Sterling, you are amazing! I will search for other posts of you, might be as good as this! However, let me express some thoughts: 1) `Is the y of point D(2) >= 1? No!` That's a yes. D should be output'ed. 2) `Is the mid-point at F in [-2:4]? Yes, we'll have to traverse both sides.` I would only traverse the left branch, since the left branch contains points with x <= 4, which is the upper bound of our range query's x value. I will stop reading here, but I got the idea, thanks a ton! — gsamaras, May 14 '16 at 18:55

Priority Search Tree confusion

1 Answers1

Linked