Search in sorted array with few comparisons

Question

You are given a std::vector<T> of distinct items. which is already sorted. type T only supports less-than < operator for comparisons. and it is a heavy function. so you have to use it as few times as possible.

Is there any better solution than a binary search? If not, is there any better solution than this, that uses less-than operator fewer times?

template<typename T>
int FindKey(const std::vector<T>& list, const T& key)
{
    if( list.empty() )
        return -1;

    int left = 0;
    int right = list.size() - 1;
    int mid;

    while( left < right )
    {
        mid = (right + left) / 2;
        if( list[mid] < key )
            left = mid + 1;
        else
            right = mid;
    }

    if( !(key < list[left]) && !(list[left] < key) )
        return left;    

    return -1;
}

It's not a real world situation, just a coding test.

If your code is working, this question may be better placed at [Code Review SE](http://codereview.stackexchange.com/). — πάντα ῥεῖ, Nov 26 '15 at 10:25
@Mehrdad Momeny There are already standard algorithms like std::binary_search, std::lower_bound, std::upper_bound, and std::equal_range. — Vlad from Moscow, Nov 26 '15 at 10:34
Do you allow onetime preprocessing? What about increased memory? — arekolek, Nov 26 '15 at 11:09
I guess the main concern is using the less-than operator as few times as possible. Then @arekolek's idea seems interesting. Though I should check how does unordered map search for the key? — Mehrdad, Nov 27 '15 at 12:32
@VladfromMoscow I guess i'm not supposed to use those functions. It was a coding quiz, that's the reason I'm not sure about this. — Mehrdad, Nov 27 '15 at 12:36
I've added some more details to my answer, maybe they will help. — arekolek, Nov 27 '15 at 13:09

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

You could trade off additional O(n) preprocessing time to get amortized O(1) query time, using a hash table (e.g. an unordered_map) to create a lookup table.

Hash tables compute hash functions of the keys and do not compare the keys themselves.

Two keys could have the same hash, resulting in a collision, explaining why it's not guaranteed that every separate operation is constant time. Amortized constant time means that if you carry out k operations that took time t in total, then the quotient t/k = O(1), for a sufficiently large k.

Live example:

#include <vector>
#include <unordered_map>
 
template<typename T>
class lookup {
  std::unordered_map<T, int> position;
public:
  lookup(const std::vector<T>& a) {
    for(int i = 0; i < a.size(); ++i) position.emplace(a[i], i);
  }
  int operator()(const T& key) const {
    auto pos = position.find(key);
    return pos == position.end() ? -1 : pos->second;
  }
};

This requires additional memory also.

If the values can be mapped to integers and are within a reasonable range (i.e. max-min = O(n)), you could simply use a vector as a lookup table instead of unordered_map. With the benefit of guaranteed constant query time.

See also this answer to "C++ get index of element of array by value", for a more detailed discussion, including an empirical comparison of linear, binary and hash index lookup.

Update

If the interface of type T supports no other operations than bool operator<(L, R), then using the decision tree model you can prove a lower bound for comparison-based search algorithms to be Ω(log n).

But then this type should provide a hash function. Something we're not given. — Mehrdad, Nov 27 '15 at 13:42
@Mehrdad For custom types you can provide your own hash functions. [unordered_map](http://en.cppreference.com/w/cpp/container/unordered_map/unordered_map) supports this. In the link there's an example custom hash for a key that is a pair of strings. Unless the interface of `T` is really limited to `bool operator<(L, R)`, but it's hard to believe such type would be useful for anything. Can you give the full interface of `T` in your question, or explain how it's practical if there's nothing else? — arekolek, Nov 27 '15 at 13:55
As I told Vlad, it wasn't a real world situation, but a coding test. There might have been a hash function available, but I didn't ask about it. — Mehrdad, Nov 28 '15 at 01:26

score 0 · Answer 2 · answered Nov 26 '15 at 10:41

You can use std::lower_bound. It does it with log(n)+1 comparisons, which is the best possible complexity for your problem.

template<typename T>
int FindKey(const std::vector<T>& list, const T& key)
{
  if(list.empty())
    return -1;
  typename std::vector<T>::const_iterator lb =
        std::lower_bound(list.begin(), list.end(), key);
  // now lb is an iterator to the first element
  // which is greater or equal to key
  if(key < *lb)
    return -1;
  else
    return std::distance(list.begin(), lb);
}

With the additionnal check for equality, you do it with log(n)+2 comparisons.

But OP asked: `Is there any better solution than a binary search?`, which `std::lower_bound` isn't, AFAIK. — user703016, Nov 26 '15 at 10:53

score 0 · Answer 3 · answered Nov 26 '15 at 19:54

0

You can use interpolation search in log log n time if your numbers are normally distributed. If they have some other distribution, you can modify this to take your distribution into account, though I don't know which distributions yield log log time.

https://en.wikipedia.org/wiki/Interpolation_search

answered Nov 26 '15 at 19:54

Dave

7,460
3
26
39

I think you meant *uniformly distributed* instead of *normally distributed*. Also, something to take into account is that worst case for interpolation search is `O(n)`. – arekolek Sep 17 '16 at 19:24
@arekolek As long as you know the distribution you can get log log n expected time. – Dave Sep 18 '16 at 18:30

Search in sorted array with few comparisons

3 Answers3

Update