1

Let's say I have n number of records(key, value).

Now I want to look for x number of keys for getting their values.

Clearly seeing that if x is small, binary search is more efficient than looping through all of the records to look for the correct key. BinarySearch is a part of Arrays in Java 8. Given that my array is already sorted before performing search.

So my time complexity is O(XlogN) + O(X)
Explains:

  • O(logN) is complexity of binary search
  • O(X) is complexity of getting value action
  • O(XlogN) dues to the number of records I want to look for

But at some point, if x becomes too big (close to value of n), it seems that just perform 1 single loop through all of my records, then comparing with all the keys I need to get values out is more efficient...

for(Record record : records){

    //Since the columns that I look for start with a specific prefix.          
    //This one is one of the factor that makes me confused 
    //when checking the performance.

    if(record.key.startWith(key-family){ 

       switch(record.key){
          case key 0:
            getvalue
            break;

          .......

          case key x:
            getvalue
            break;
       }
     }
}

For this solution, my complexity is O(N) + O(NX)
Explains:

  • O(N) is complexity of the for loop
  • O(NX) is complexity of switch statement and getting value out (Worst case scenario) until the looping is done.

Given that the appreance of record's key is not uniform. Some record collection will have a significant different in number of keys that start with my key-family, compares to other record collection.

My questions are:

  1. How should we define when x is too big and makes binary search solution become inefficiency?

  2. Is there something that I can learn from you? :)

Thank you.

Xitrum
  • 7,765
  • 26
  • 90
  • 126
  • 1
    As far as i know and understand , if the your array is sorted the time complexity for binary search is O(log n) which is always better than looping through the array. – Amer Qarabsa Sep 30 '16 at 10:58
  • 1
    "which is always better than looping through the array" Big-oh notation is an asymptotic bound: it says which is faster as `n` increases. There may well be values of `n` for which linear search is faster than binary search. – Andy Turner Sep 30 '16 at 11:01
  • 2
    You said it your self, you need to loop over `x` if it is close to `n`, so you have two loops -> O(n^2) – A4L Sep 30 '16 at 11:01
  • Hi @A4L, let's say that in the case that I don't use another loop through x, but a switch to check key value. Will this make any different to your answer ? – Xitrum Sep 30 '16 at 11:13
  • Binary search requires sorting which is O(nlgn) and the search which is O(lg), so it's O(nlgn) in total, while linear search is O(n) which is less than O(nlgn) – Sleiman Jneidi Sep 30 '16 at 11:14
  • Hi @SleimanJneidi Given that my array is already sorted before performing search. ;) – Xitrum Sep 30 '16 at 11:15
  • *How should we define when x is too big and makes binary search solution become inefficient?* You would have to measure to be sure, but the last time I measured looping was more efficient around 20% of N. This is a more important concept when doing relational database selects. At some point (around 20%) a table scan is more efficient the individual selects. – Gilbert Le Blanc Sep 30 '16 at 11:53
  • @Xitrum most likely it will be the same or just unsignificantly less since you will have to do the comparison each time, not to mention that your `x` is unknown as far as i could understand from your question. On the other hand, actually you should ask for the case of `x` beeing small (for example 1) so that you spare the cost for the sort (since bin seach needs a sorded collection) and just do a loop over all elements which will give you only O(n). – A4L Sep 30 '16 at 11:54
  • @A4L The collection is pre-sorted already. If my complexity formulas are correct, the performance of the binary search starts slowing down when x comes to n/2 , and slightly worst than loop when x reaches n value – Xitrum Sep 30 '16 at 13:12
  • @Xitrum I unterstand ... would it be an option for you to use a Hash- set or map so that you at lease have to loop only over one collection (either one) and use hashing for the other? – A4L Sep 30 '16 at 14:57
  • @A4L, actually the binary search is API provided, the loop is something to do manually. Your suggestion sounds really interested. Perhaps can you make it as an answer with a bit further explanation so I can upvote? – Xitrum Sep 30 '16 at 17:59

2 Answers2

1

If X is close to N binary search for X keys becomes O(N log N).

Linear search with a switch statement for X keys would seem to be N. If the switch is implemented as a pure jump table. Java uses an intelligent mix of tableswitch and tablelookup: immediate jump table and (slower) lookup in a value array. One probably must let a switch cost also O(log X), so in total also N(log N).

Now a huge switch can be done oneself by using the N values as indexes. That would be feasible if the numbers were in the range of N (or 4N); that is the array would not be too sparse.

You then could make a BitSet. Real life is seldom as nice however.

Seeing the word "record" I even would say, leave it to the database.

But there is a nice solution

If you sort the X keys, the binary search for the ith key can start at the found/insert postion of the (i-1)th key. Hence it is not O(N log N) but less.

ix = Arrays.binarySearch(array, ix, array.length, key);
if (ix < 0) { // Not found, insert position is -x-1 or ~x
    ix = ~ix; // Make it the insert position (x ^= -1; would do too)
}

As there is an assymetry: binary-search on an ever decreasing range, I made a symmetric recursive binary-binary-search. Not for performance, but for the algorithm.

/**
 * @param array sorted
 * @param keys sorted
 * @return found keys
 */
static Set<Integer> search(int[] array, int[] keys) {
    Set<Integer> foundKeys = new HashSet<>();
    find(foundKeys, array, 0, array.length, keys, 0, keys.length);
    return foundKeys
}

private static find(Set<Integer> foundKeys,
        int[] array, int a0, int an,
        int[] keys, int k0, int kn) {
    if (k0 < kn) {
        int k = (k0 + kn) / 2;
        int key = keys[k];
        int a = Arrays.binarySearch(array, a0, an, key);
        if (a >= 0) {
            foundKeys.add(key);
        } else {
            a = ~a;
        }
        find(foundKeys, array, a0, a, keys, k0, k);
        find(foundKeys, array, a, an, keys, k + 1, kn);
        // The overlap at a / repetition of a is due to:
        // - not found
        // - found but keys theoretically might contain doubles
    }
}

(Sorting the keys however would cost O(X log X), but the compiler did the same.)

Community
  • 1
  • 1
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Hi @Joop, can you explain a bit more about the range from N to 4N in the sentence of your answer "Now a huge switch can be done oneself by using the N values as indexes. That would be feasible if the numbers were in the range of N (or 4N); that is the array would not be too sparse."? Why is it from N to 4N ? Thank you. – Xitrum Sep 30 '16 at 20:27
  • 1
    If the value range goes to 4N one has three quarters of unused indices. With a BitSet one uses one bit for a value, I would personally still find 4N acceptable. Purely gut feeling. – Joop Eggen Oct 01 '16 at 12:07
0
  1. Binary search requires input to be sorted so, this is no way an efficient solution. Assuming your input is not sorted.

  2. Looping does require to go through all records.

  3. Hashing of keys is something which you can look to increase performance of compare and fetch operations.

IMHO, option 3 is much better when we compare in terms of space, time complexity and the trade-offs involved. In Java you can use HashMap for most of the cases (assuming you are not dealing with issues like in big-data).

Prateek Jain
  • 2,738
  • 4
  • 28
  • 42