Fastest way to find number of elements in a range

Question

Given an array with n elements, how to find the number of elements greater than or equal to a given value (x) in the given range index i to index j in O(log n) or better complexity?

my implementation is this but it is O(n)

for(a=i;a<=j;a++)
    if(p[a]>=x) // p[] is array containing n elements
    count++;

If you sort the array you might be able to use binary search. By definition, every value greater than the first match in a sorted array can be added with no additional processing. — Elliott Frisch, May 01 '15 at 04:49
@ElliottFrisch but if i sort the array, the elements between index i,j will vary , — pola sai ram, May 01 '15 at 04:52
@ProgrammerPerson i was given a array and next given a number of queries (i,j,x) , i need to find number of elements greatan or equal to x in (i,j) range for every query — pola sai ram, May 01 '15 at 04:55
@ProgrammerPerson sir , actually my bro asked me to solve this .Dont know from where he got — pola sai ram, May 01 '15 at 05:25
There's parts of the problem which aren't specified, which makes it impossible to answer. I guess they're something along the lines of a preprocess step with its own memory and time constraints. — Paul Hankin, May 01 '15 at 05:39

score 2 · Answer 1 · answered May 01 '15 at 04:55

If the array is sorted you can locate the first value less than X with a binary search and the number of elements greater than X is the number of items after that element. That would be O(log(n)).

If the array is not sorted there is no way of doing it in less than O(n) time since you will have to examine every element to check if it's greater than or equal to X.

Programmer Person · Accepted Answer · 2015-05-01T05:28:04.850

2

If you are allowed to preprocess the array, then with O(n log n) preprocessing time, we can answer any [i,j] query in O(log n) time.

Two ideas:

1) Observe that it is enough to be able to answer [0,i] and [0,j] queries.

2) Use a persistent* balanced order statistics binary tree, which maintains n versions of the tree, version i is formed from version i-1 by adding a[i] to it. To answer query([0,i], x), you query the version i tree for the number of elements > x (basically rank information). An order statistics tree lets you do that.

*: persistent data structures are an elegant functional programming concept for immutable data structures and have efficient algorithms for their construction.

edited May 01 '15 at 05:28

answered May 01 '15 at 04:56

Programmer Person

598
3
12

This is only worth it if there is a large number of queries (x >> log(n)), but otherwise sorting it beforehand will produce worse times than just examining everything between `i` and `j`. – Raniz May 01 '15 at 04:59
@Raniz: Yes, that is true. `x > log(n) ` is not really large, though. For example for x = 1 million, we are talking about more than 20 queries. – Programmer Person May 01 '15 at 05:01
i need to compare elements in the range `(i,j) of p[]` with `x` and p[] contains `n` elements and this `(i,j,x)` set is given many times for the same array p[] – pola sai ram May 01 '15 at 05:03
@polasairam: I presume you are talking about observation 1)? I suggest you think about it... (or is your comment directed to Raniz? in which case I suggest you use the @ feature...) – Programmer Person May 01 '15 at 05:05
@ProgrammerPerson ok sir and i think total complexity of above algorithm is O(nlogn) ? pls correct me if i am wrong – pola sai ram May 01 '15 at 05:08
2

@polasairam: It is O(nlog n + K log n), where K is the number of queries. – Programmer Person May 01 '15 at 05:09
FWIW For the online case, you could instead use a fenwick tree of ordered sets, which is much easier to implement and faster in practice. For the described offline case however, we can even sort the query borders and iterate through them once while keeping a set of values in the current prefix. – Niklas B. May 01 '15 at 08:00

score 2 · Answer 3 · edited May 23 '17 at 11:50

Impossible in O(log N) because you have to inspect all the elements, so a O(N) method is expected.

The standard algorithm for this is based on quicksort's partition, sometimes called quick-select.

The idea is that you don't sort the array, but rather just partition the section containing x, and stop when x is your pivot element. After the procedure is completed you have all elements x and greater to the right of x. This is the same procedure as when finding the k-th largest element.

Read about a very similar problem at How to find the kth largest element in an unsorted array of length n in O(n)?.

The requirement index i to j is not a restriction that introduces any complexity to the problem.

sir, here `x` may or may not be included in `array p[]` , – pola sai ram May 01 '15 at 05:35 — pola sai ram, May 01 '15 at 05:35

score 0 · Answer 4 · 2015-05-01T06:13:09.097

Given your requirements where the data is not sorted in advance and constantly changing between queries, O(n) is the best complexity you can hope to achieve, since there's no way to count the number of elements greater than or equal to some value without looking at all of them.

It's fairly simple if you think about it: you cannot avoid inspecting every element of a range for any type of search if you have no idea how it's represented/ordered in advance.

You could construct a balanced binary tree, even radix sort on the fly, but you're just pushing the overhead elsewhere to the same linear or worse, linearithmic O(NLogN) complexity since such algorithms once again have you inspecting every element in the range first to sort it.

So there's actually nothing wrong with O(N) here. That is the ideal, and you're looking at either changing the whole nature of the data involved outside to allow it to be sorted efficiently in advance or micro-optimizations (ex: parallel fors to process sub-ranges with multiple threads, provided they're chunky enough) to tune it.

In your case, your requirements seem rigid so the latter seems like the best bet with the aid of a profiler.

Fastest way to find number of elements in a range

4 Answers4