efficiently find amount of integers in a sorted array

Question

I'm studying for a test, and found this question.

You are given a sorted array of integers for example:

{-5, -5, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 67, 67, 99}

Write a method:

Public static int count (int[] a, int x)

that returns the amount of times, number 'x' is in the array.

for example:

x = -5, it returns 2
x = 2, it returns 5
x = 8, it returns 0

I need to write it as efficient as possible, please don't give me the answer (or write it if you wish, but I won't look), my idea is to do a binary search, and then go both edges (backward and forward) of the value that I find and with the index numbers, return the correct answer, my questions are:

is it the most efficient way?
won't it be O(n) in the worst case? (when the array is filled with a single number) -

If so - then why should I do a binary search?

@ambigram_maker it would probably be a "modified" binary search where if you find your key, you still keep searching. — C.B., Apr 02 '14 at 14:33
If you are allowed to do so, check whether the number is present in the array. If not, return. Otherwise, I'd say use premature return to exit the method as soon as possible. — blueygh2, Apr 02 '14 at 14:33
there must be a way below O(n), it's a question about complexity, they won't give me a question that I can just go O(n) times and count the number when I find it, it's a twist of binary search in my opinion. — Seth Keno, Apr 02 '14 at 14:33
@SethKeno obviously your algorithm has to look at every element in the array. I don't think it gets better than O(n). The naive approach i can think of would rather be O(n^2) [for each element, look at it, then go through the array, counting that element] — fstd, Apr 02 '14 at 14:34
@SethKeno, for a test the trivial solution is `O(n)` and your solution is also `O(n)` in the worst case. — Anthony Accioly, Apr 02 '14 at 14:36
@fstd what about if I do a binary search, find the number, go backwards until I find a different number, and go forward until I find a different number (or array ends each direction), will it be less than O(n) or not? — Seth Keno, Apr 02 '14 at 14:36
@user432 and how do you find where the 'next' element is, without going over it linearly? — fstd, Apr 02 '14 at 14:36
@SethKeno I'm not sure what you have in mind, but note that the big-O notation specifies an *upper* bound of the complexity. Your algoritm, even if it on average might outperform the linear solution, sounds still like O(n) to me — fstd, Apr 02 '14 at 14:38
@SethKeno the naive approach would have a loop going over the array, and for each (not yet seen) element, it would again go over the array (i.e. two nested loops of complexity O(n) -- that's O(n^2). EDIT: admittedly it wouldn't have to go over the *entire* array all the time, but it still qualifies as O(n^2) (beause it's an upper bound, as said) — fstd, Apr 02 '14 at 14:40
@fstd Have you read the question? Even the most naive implementation (counting the # of occurences of ONE value in an array is no more than O(n)! And given that the array is sorted, this can certainly be solved in O(log(n)) using some divide and conquer approach. — Gyro Gearless, Apr 02 '14 at 15:03
@GyroGearless I originally misread the question, as is evident in the comments of a different answer. Forgot to update this thread. (I originally read the question as if the goal was to get the frequency of /all/ distinct members in the array) — fstd, Apr 03 '14 at 15:04
possible duplicate of [Using Binary Search with sorted Array with duplicates](http://stackoverflow.com/questions/13197552/using-binary-search-with-sorted-array-with-duplicates) — nawfal, Jun 15 '14 at 20:08

score 15 · Answer 1 · edited May 23 '17 at 10:26

15

Modify your binary search to find the first and last occurrence of the given input, then the difference between those two indexes is the result.

To find a first and last occurrence using binary search you need to change a bit from the usual binary search algorithm. In binary search the value is returned when a match is found. But here unlike the usual binary search you need to continue searching until you find a mismatch.

helpful links

finding last occurence, finding first occurance

A bit update

after you find the first occurrence then you can use that index as a starting point of the next binary search to find the last.

edited May 23 '17 at 10:26

Community

1
1

answered Apr 02 '14 at 14:37

stinepike

54,068
14
92
112

1

You really should elaborate on how to do each of these binary searches in the answer itself. – Bernhard Barker Apr 02 '14 at 14:47
hi @Dukeling .. those are already answered in other threads.. so I thought it would be enough to point those links – stinepike Apr 02 '14 at 14:48
Clever :). This is actually `O(log n)`. Anyway, considering branch predictions, I'm interested to know how large must `int[] a` be so that this approach starts beating the naive `O(n)` approach. I assume that in practice the `O(n)` method is still faster for small arrays. – Anthony Accioly Apr 02 '14 at 14:48
1

Consider that this answer requires that you read at least 3 answers on different pages to see how to do this. Then consider that you can probably summarize both approaches in about 2 sentences in this answer (and provide the links for additional reference). To provide the maximum long-term value, it seems like a no-brainer to do the latter. – Bernhard Barker Apr 02 '14 at 14:55
thanks @Dukeling for the advice , i am editing the answer a bit – stinepike Apr 02 '14 at 14:58
2

@StinePike Two answers within a gap of 30 seconds. One gets 7 upvotes while the other is on 1. – Nikunj Banka Apr 02 '14 at 15:04
1

@NikunjBanka SO is notoriously time-sensitive. Plus, SO crowds don't like long answers. Shorter answer will usually attract more upvotes, just because it can be skimmed through faster (that is my theory anyway). If we could specify code window's height, and you'd specify it like, say, 5 lines, I'm sure your answer would get more upvotes. :) – Will Ness Apr 02 '14 at 15:10
@AnthonyAccioly - It would also be interesting to compare this against binary search to find the first occurrence, then linear search from there to count the number of occurrences. – mbeckish Apr 02 '14 at 20:22
@mbeckish ... yes it can be done .. but what if there is only single number ocurring in all elements ? .... in second case it will use O(n) – stinepike Apr 03 '14 at 02:15
1

@StinePike - Yes, that would be the worst case. I would consider your solution to be best for large data sets with no a priori knowledge of the nature of the data. – mbeckish Apr 03 '14 at 18:59

RelevantTitlesAreBestTitles · Answer 2 · 2014-04-02T15:12:40.880

Two solutions come to mind:

1) Do Binary Search alright, but maintain that invariant that it finds the first occurence. Then do a linear search. This will be Theta(log n + C) where C is the count.

Programming Pearls by Jon Bentley has a nice write up, where he mentions looking for the first occurence is actually more efficient than looking for any occurence.

2) You can also do two binary searches, one for the first occurence, and one for the last, and take the difference of the indices. This will be Theta(log n).

You can decide which solution to apply based on the expected value of C. If C = o(log n) (yes small o), then looking for the first occurence and doing linear search will be better. Otherwise do two binary searches.

If you don't know the expected value of C, then you might be better off with solution 2.

Nikunj Banka · Answer 3 · 2014-04-02T15:15:48.047

Do binary Search to find the first occurence. Do binary search to find the last occurence. The number of occurences is equal to the number of numbers between the 2 indices found.

Working code:

public class Main {
    public static void main(String[] args){
        int[] arr = {-5, -5, 1, 1, 1, 1, 1, 1, 
                                    1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 67, 67, 99};
        int lo = getFirst(arr, -5);
        if(lo==arr.length){ // the number is not present in the array.
            System.out.println(0);
        }else{
            int hi = getLast(arr, -5);
            System.out.println((hi-lo+1));
        }
    }

    // Returns last occurence of num or arr.length if it does not exists in arr.
    static int getLast(int[] arr, int num){
        int lo = 0, hi = arr.length-1, ans = arr.length;
        while(lo<=hi){
            int mid = (lo+hi)/2;
            if(arr[mid]==num){
                ans = mid;
                lo = mid+1;
            }else if(arr[mid]<num){
                lo = mid+1;
            }else if(arr[mid]>num){
                hi = mid-1;
            }
        }
        return ans;
    }

    // Returns first occurence of num or arr.length if it does not exists in arr.
    static int getFirst(int[] arr, int num){
        int lo = 0, hi = arr.length-1, ans = arr.length;
        while(lo<=hi){
            int mid = (lo+hi)/2;
            if(arr[mid]==num){
                ans = mid;
                hi = mid-1;
            }else if(arr[mid]<num){
                lo = mid+1;
            }else if(arr[mid]>num){
                hi = mid-1;
            }
        }
        return ans;
    }
}

Minor correction. You would need it `ans = -1;` to start with. — nawfal, Jun 15 '14 at 20:27

score 1 · Answer 4 · answered Apr 02 '14 at 19:18

Actually there is a slightly better solution than the given solutions! It is a combination of two different ways to do binary search.

First you do a binary search to get the first occurence. This is O(log n)

Now, starting with the first index you just found, you do a different kind of binary search: You guess the frequency of that element F, by starting with a guess of F=1 and doubling your estimate and check if the element repeats. This is guaranteed to be O(log F) (where F is the frequency).

This way, the algorithm runs in O(log N + log F)

You do not have to worry about the distribution of the numbers!

score 0 · Answer 5 · answered Apr 03 '14 at 06:46

IMHO this is the most efficient solution: Others may have mentioned a similiar approach, but I think this one is the easiest to explain and the easiest to understand, also it has a modification that will speed up the process in practice:

Basically the idea is the find the smallest and largest index of the occurence. Finding the smallest is O(log N) using binary search (using Newton's method to actually increase the performance in the average case is a possible improvement). If you don't know how to modify binary search to find the smallest index, the trivial modification is to look for the element with value (p - 0.5) - obviously you will not find that value in the integer array, but if the binary search terminates the index will be the one next to where the recursion stops. You will just need to check whether it exists at all. This will give you the smallest index.

Now in order to find the largest index, again you will have to start a binary search, this time using the smallest index as the lower bound and (p + 0.5) as the search target, this is guaranteed to be O(log N), in the average case it will be O(log N/2). Using Newton's method and taking the values of the upper and lower bounds into account will in practice improve the performance.

Once you have found the largest and smallest index, the difference between them obviously is the result.

For evenly distributed numbers, using the Newton modification will drastically improve the runtime (in the case of a continuous equidistant sequence of numbers it will be O(1) (two or three steps) to find the smallest and greatest values), although the theoretical complexity still is O(log N) for arbitrary input.

efficiently find amount of integers in a sorted array

5 Answers5