Kth minimum in a Range

Question

Given an array of integers and some query operations.
The query operations are of 2 types
1.Update the value of the ith index to x.
2.Given 2 integers find the kth minimum in that range.(Ex if the 2 integers are i and j ,we have to find out the kth minimum between i and j both inclusive).
I can find the Range minimum query using segment tree but could no do so for the kth minimum. Can anyone help me?

What complexity are you expecting? O(logN)? O(logN+k)? O(klogN)? — amit, Feb 17 '14 at 10:11
is k constant for this problem? or a query contains 3 integers named i,j and k? — cegprakash, Feb 27 '14 at 07:26

Niklas B. · Accepted Answer · 2014-03-05T13:09:46.790

9

Here is a O(polylog n) per query solution that does actually not assume a constant k, so the k can vary between queries. The main idea is to use a segment tree, where every node represents an interval of array indices and contains a multiset (balanced binary search tree) of the values in the represened array segment. The update operation is pretty straightforward:

Walk up the segment tree from the leaf (the array index you're updating). You will encounter all nodes that represent an interval of array indices that contain the updated index. At every node, remove the old value from the multiset and insert the new value into the multiset. Complexity: O(log^2 n)
Update the array itself.

We notice that every array element will be in O(log n) multisets, so the total space usage is O(n log n). With linear-time merging of multisets we can build the initial segment tree in O(n log n) as well (there's O(n) work per level).

What about queries? We are given a range [i, j] and a rank k and want to find the k-th smallest element in a[i..j]. How do we do that?

Find a disjoint coverage of the query range using the standard segment tree query procedure. We get O(log n) disjoint nodes, the union of whose multisets is exactly the multiset of values in the query range. Let's call those multisets s_1, ..., s_m (with m <= ceil(log_2 n)). Finding the s_i takes O(log n) time.
Do a select(k) query on the union of s_1, ..., s_m. See below.

So how does the selection algorithm work? There is one really simple algorithm to do this.

We have s_1, ..., s_n and k given and want to find the smallest x in a, such that s_1.rank(x) + ... + s_m.rank(x) >= k - 1, where rank returns the number of elements smaller than x in the respective BBST (this can be implemented in O(log n) if we store subtree sizes). Let's just use binary search to find x! We walk through the BBST of the root, do a couple of rank queries and check whether their sum is larger than or equal to k. It's a predicate monotone in x, so binary search works. The answer is then the minimum of the successors of x in any of the s_i.

Complexity: O(n log n) preprocessing and O(log^3 n) per query.

So in total we get a runtime of O(n log n + q log^3 n) for q queries. I'm sure we could get it down to O(q log^2 n) with a cleverer selection algorithm.

UPDATE: If we are looking for an offline algorithm that can process all queries at once, we can get O((n + q) * log n * log (q + n)) using the following algorithm:

Preprocess all queries, create a set of all values that ever occured in the array. The number of those will be at most q + n.
Build a segment tree, but this time not on the array, but on the set of possible values.
Every node in the segment tree represents an interval of values and maintains a set of positions where these values occurs.
To answer a query, start at the root of the segment tree. Check how many positions in the left child of the root lie in the query interval (we can do that by doing two searches in the BBST of positions). Let that number be m. If k <= m, recurse into the left child. Otherwise recurse into the right child, with k decremented by m.
For updates, remove the position from the O(log (q + n)) nodes that cover the old value and insert it into the nodes that cover the new value.

The advantage of this approach is that we don't need subtree sizes, so we can implement this with most standard library implementations of balanced binary search trees (e.g. set<int> in C++).

We can turn this into an online algorithm by changing the segment tree out for a weight-balanced tree such as a BB[α] tree. It has logarithmic operations like other balanced binary search trees, but allows us to rebuild an entire subtree from scratch when it becomes unbalanced by charging the rebuilding cost to the operations that must have caused the imbalance.

edited Mar 05 '14 at 13:09

answered Feb 27 '14 at 16:51

Niklas B.

92,950
18
194
224

Note that, to be provided efficiently, `.rank()` needs the BBST to be augmented, e.g., each node stores the number of children in its left child. – David Eisenstat Feb 27 '14 at 17:05
@David Eisenstat: Yes, the BBST must have `O(log n)` rank/insert/delete and must support multiple equal nodes. It should also support linear time merge so that we get `O(n log n)` time to build the initial segment tree (but that's not too important) – Niklas B. Feb 27 '14 at 17:07
One can work around equal values by pairing with the array index, but rank alone disqualifies most standard library implementations. – David Eisenstat Feb 27 '14 at 17:15
@David: Sure enough. Byebye `multiset` or `TreeSet`, hello own BBST implementation. Good thing treaps can be implemented in < 60 LoC. Not sure what your point is, though, since BBSTs with subtree sizes are a pretty common primitive in algorithm engineering. I kept the answer at a pseudocode level, but obviously the implementation would require a bit of work. – Niklas B. Feb 27 '14 at 17:20
My point is that it's a very useful decoration, which more library designers should see the value of. – David Eisenstat Feb 27 '14 at 17:23
I mean, it would be wildly hypocritical of me to rag on the implementation complexity when my suggestion of [fractional cascading](http://en.wikipedia.org/wiki/Fractional_cascading) to drop the query time to O(log^2 n) is much, much worse :) – David Eisenstat Feb 27 '14 at 17:30
@DavidEisenstat: I actually had that in mind too (I've seen Erik Demaine's lectures). I think it's a bit too complicated though – Niklas B. Feb 27 '14 at 17:30
The ancestors of the trees that we want to query number only O(log n) as well, so the application isn't so complicated -- just link parent to child. It might even be somewhat fun to implement with skip lists (keep the priorities the same across all trees and have crazy 3D links going). – David Eisenstat Feb 27 '14 at 17:35
@David: Ah, I see. I initially thought we'd have to embed the query BBSTs into each other, but we can traverse the segtree again for every choice of `x`. Yes, that's pretty cool. BTW, for real applications [Boost Intrusive](http://www.boost.org/doc/libs/1_55_0/doc/html/intrusive.html) is a well-designed library that has tons of BBST-like datastructures with augmentation hooks. – Niklas B. Feb 27 '14 at 17:48
Oops, dynamic fractional cascading adds a log log n. You had it right all along; this is a log not worth shaving. – David Eisenstat Feb 27 '14 at 18:07
My head hurts. My previous suggestion was bogus, so I'll take your word for it. – David Eisenstat Feb 27 '14 at 18:26
What I just said was also bogus, so strike that and let's settle with "it's at least more complicated than we thought" – Niklas B. Feb 27 '14 at 18:27
Hi, could you please show an image of this tree with any example? – cegprakash Mar 02 '14 at 22:35
@cegprakash: It is in fact a tree of trees (segment tree of binary search trees). I don't think a drawing will be very helpful, but the intuition is the following: The nodes of the segment tree represent intervals of the array. The nodes of the binary search trees stored inside the segment tree nodes represent the array elements, ordered by their value so that we can do select queries. I won't provide a drawing because it's just too much work to create one, but if you want to give it a try yourself, I am certainly willing to comment on it or and add it to the answer as a visualization. – Niklas B. Mar 02 '14 at 22:57
@cegprakash: I am also willing to clarify any unclear parts of my answer in case this is your concern – Niklas B. Mar 02 '14 at 22:57
@cegprakash: Essentially you can also imagine the segment tree as a [range tree](http://en.wikipedia.org/wiki/Range_tree) with x coordinate being array index and y being the value of the array elements. – Niklas B. Mar 03 '14 at 00:06
I still couldn't imagine how a node will look like in this tree. You have mentioned that each node in your segment tree is a BST. What is the structure of every node in the BST?. What is the key for the BST? Without knowing all these things, I can't continue reading your solution. :| Have to decide my bounty today :| – cegprakash Mar 05 '14 at 12:05
@cegprakash have you read the article about range trees? Because that's basically what the online algorithm is about. I think the offline variant is easier to understand though, so have a look at my update. – Niklas B. Mar 05 '14 at 12:39
@Niklas: Could you plz mention the structure of the nodes node (for the outer segment tree and your inner BST)? I don't understand what you mean by reachable leaf nodes. – cegprakash Mar 05 '14 at 12:43
@cegprakash well a segment tree node represents an interval of indices in the array. For example, the root represents the whole array, its left node the left half etc. In every node, we store the multiset of *values* (that is, integers) that are present in the array interval that the node represents. This is the main idea, the rest is just implementing the selection algorithm – Niklas B. Mar 05 '14 at 12:52
I will try to clarify the answer a bit – Niklas B. Mar 05 '14 at 12:53
@cegprakash in a standard application of a segtree, you just store a single value in every node that is some kind of aggregation of the values in the segment (for example minimum or sum). In this case the aggregation is set union, so the aggregated values are sets (or rather multisets) – Niklas B. Mar 05 '14 at 13:12
Haven't seen this mentioned, so I would like to point out that it is possible to reduce the query time to log(n) from log^3(n) with a persistent segment tree. More detail available here: https://cp-algorithms.com/data_structures/segment_tree.html#toc-tgt-12 – timg May 29 '21 at 18:42

score 7 · Answer 2 · edited May 23 '17 at 12:26

7

If this is a programming contest problem, then you might be able to get away with the following O(n log(n) + q n^0.5 log(n)^1.5)-time algorithm. It is set up to use the C++ STL well and has a much better big-O constant than Niklas's (previous?) answer on account of using much less space and indirection.

Divide the array into k chunks of length n/k. Copy each chunk into the corresponding locations of a second array and sort it. To update: copy the chunk that changed into the second array and sort it again (time O((n/k) log(n/k)). To query: copy to a scratch array the at most 2 (n/k - 1) elements that belong to a chunk partially overlapping the query interval. Sort them. Use one of the answers to this question to select the element of the requested rank out of the union of the sorted scratch array and fully overlapping chunks, in time O(k log(n/k)^2). The optimum setting of k in theory is (n/log(n))^0.5. It's possible to shave another log(n)^0.5 using the complicated algorithm of Frederickson and Johnson.

edited May 23 '17 at 12:26

Community

1
1

answered Feb 27 '14 at 21:57

David Eisenstat

64,237
7
60
120

In a programming contest we can get away with a simpler `O((q + n) * log^2 (q + n))` offline approach that I outlined in an update of my answer now. It also doesn't require subtree size augmentation :) Also may I ask what Bentley's algorithm is? Is it the "naive" [range query](http://en.wikipedia.org/wiki/Range_tree#Range_queries) implementation? What would be a thriftier alternative? Using fractional cascading in a weight-balanced range tree to support updates? – Niklas B. Mar 05 '14 at 06:50
@NiklasB. Yes, it's the naive range query implementation. Chazelle reduced the space requirement to linear, and several authors since have improved the query time slightly. – David Eisenstat Mar 05 '14 at 12:02
I feel that this solution is easier to understand and easier to implement rather than building a segement tree of BST.. – cegprakash Mar 05 '14 at 12:14
@cegprakash The segment tree is less work than it sounds like, but perhaps. – David Eisenstat Mar 05 '14 at 12:19
I haven't worked much on Segment Tree. But of course I've tried RMQ problem. Niklas' solution is too difficult to understand for beginners like me. He is more into space complexity and Time complexity rather than explaining what and why he does stuffs. – cegprakash Mar 05 '14 at 12:29
@cegprakash We all have to remain interested in this material somehow. I mostly gave up on competitive programming a while ago to focus on my (algorithms) research, where big-O rules the land (not that I always agree). – David Eisenstat Mar 05 '14 at 14:10
Interestingly Big-O pretty much rules competitive programming as well, which is a good indication of the fact that asymptotics have a certain practical relevance as well :) At least as much as competitive programming is practical... – Niklas B. Mar 05 '14 at 14:37
@NiklasB. Not to the point where galactic constants are tolerated, the way they are in TCS. – David Eisenstat Mar 05 '14 at 14:43

score 1 · Answer 3 · answered Feb 17 '14 at 10:42

1

perform a modification of the bucket sort: create a bucket that contains the numbers in the range you want and then sort this bucket only and find the kth minimum.

answered Feb 17 '14 at 10:42

Muhammad Soliman

529
5
17

1

Not sure how you want to implement bucket sort here - the range is unknown. Anyway, this will resut at best case in linear time performance and linear space, where selection algroithm gets linear time performance with very little extra space. I believe the OP is after a sub-linear time performance algorithm, at the cost of sublinear operations during update. – amit Feb 17 '14 at 10:49

score 0 · Answer 4 · 2017-05-16T19:20:07.037

Damn, this solution can't update an element but at least finds that k-th element, here you'll get some ideas so you can think of some solution that provides update. Try pointer-based B-trees.

This is O(n log n) space and O(q log^2 n) time complexity. Later I explained the same with O(log n) per query.

So, you'll need to do the next:

1) Make a "segment tree" over given array.

2) For every node, instead of storing one number, you would store a whole array. The size of that array has to be equal to the number of it's children. That array (as you guessed) has to contain the values of the bottom nodes (children, or the numbers from that segment), but sorted.

3) To make such an array, you would merge two arrays from its two sons from segment tree. But not only that, for every element from the array you have just made (by merging), you need to remember the position of the number before its insertion in merged array (basically, the array from which it comes, and position in it). And a pointer to the first next element that is not inserted from the same array.

4) With this structure, you can check how many numbers there are that are lower than given value x, in some segment S. You find (with binary search) the first number in the array of the root node that is >= x. And then, using the pointers you have made, you can find the results for the same question for two children arrays (arrays of nodes that are children to the previous node) in O(1). You stop to operate this descending for each node that represents the segment that is whole either inside or outside of given segment S. The time complexity is O(log n): O(log n) to find the first element that is >=x, and O(log n) for all segments of decomposition of S.

5) Do a binary search over solution.

This was solution with O(log^2 n) per query. But you can reduce to O(log n):

1) Before doing all I wrote above, you need to transform the problem. You need to sort all numbers and remember the positions for each in original array. Now these positions are representing the array you are working on. Call that array P.

If bounds of the query segment are a and b. You need to find the k-th element in P that is between a and b by value (not by index). And that element represents the index of your result in original array.

2) To find that k-th element, you would do some type of back-tracking with complexity of O(log n). You will be asking the number of elements between index 0 and (some other index) that are between a and b by value.

3) Suppose that you know the answer for such a question for some segment (0,h). Get answers on same type of questions for all segments in tree that begin on h, starting from the greatest one. Keep getting those answers as long as the current answer (from segment (0,h)) plus the answer you got the last are greater than k. Then update h. Keep updating h, until there is only one segment in tree that begins with h. That h is the index of the number you are looking for in the problem you have stated.

To get the answer to such a question for some segment from tree you will spend exactly O(1) of time. Because you already know the answer of it's parent's segment, and using the pointers I explained in the first algorithm you can get the answer for the current segment in O(1).

Kth minimum in a Range

4 Answers4

Linked