120

I need a binary search algorithm that is compatible with the C++ STL containers, something like std::binary_search in the standard library's <algorithm> header, but I need it to return the iterator that points at the result, not a simple boolean telling me if the element exists.

(On a side note, what the hell was the standard committee thinking when they defined the API for binary_search?!)

My main concern here is that I need the speed of a binary search, so although I can find the data with other algorithms, as mentioned below, I want to take advantage of the fact that my data is sorted to get the benefits of a binary search, not a linear search.

so far lower_bound and upper_bound fail if the datum is missing:

//lousy pseudo code
vector(1,2,3,4,6,7,8,9,0) //notice no 5
iter = lower_bound_or_upper_bound(start,end,5)
iter != 5 && iter !=end //not returning end as usual, instead it'll return 4 or 6

Note: I'm also fine using an algorithm that doesn't belong to the std namespace as long as its compatible with containers. Like, say, boost::binary_search.

BartoszKP
  • 34,786
  • 15
  • 102
  • 130
Robert Gould
  • 68,773
  • 61
  • 187
  • 272
  • 2
    Regarding the edit: that's why std::equal_range is the solution. Otherwise, you'll have to test for equality (or equivalence to be more) – Luc Hermitte Jan 15 '09 at 10:56
  • You have to test for equality after using (lower/upper)_bound (see answer below). – Luc Touraille Jan 15 '09 at 10:56
  • 1
    lower_bound and upper_bound documentation state that the range must be sorted, and because of this they can be implemented as binary search. – vividos Jan 15 '09 at 11:02
  • @vividos, hurray! you found just the piece of documentation I needed to know about! Thanks! – Robert Gould Jan 15 '09 at 11:04
  • Robert, the lower/upper_bound/equal_range algorithms do not work with unsorted ranges. You're just lucky to see them working with the elements sample you took. – Luc Hermitte Jan 15 '09 at 11:04
  • @Luc Hermitte, yes you are totally right, thanks again! – Robert Gould Jan 15 '09 at 11:10
  • my first look usually goes to http://www.sgi.com/tech/stl/ . It documents the STL before C++ included it into Standard C++ Library, but most of the documentation continues to be valid. – vividos Jan 17 '09 at 00:12
  • I also get often confused by `std::lower_bound`, but I believe that is the way to go for binary search, see [here](http://www.stud.fit.vutbr.cz/~xpolok00/coolstuff.htm#link23). You just need to check for end and then for equality with your needle, instead of just checking for the end. It is only a small constant cost and `lower_bound` should be as fast as `binary_search`, unless your data contains a lot of repetition, in which case `binary_search` may find the needle slightly faster if it is present (if it is not, then it needs to find lower/upper bound anyway). – the swine Nov 17 '14 at 16:53
  • Me: STL, please search my sorted container for this value. STL: Yes, its there. Me: *sigh*, okay where is it? STL: I can tell you where there's a value that's equal or greater. Me: *sigh* so I have to check myself if it's really the found result, or something greater? – Emile Cormier May 27 '20 at 00:32

9 Answers9

105

There is no such functions, but you can write a simple one using std::lower_bound, std::upper_bound or std::equal_range.

A simple implementation could be

template<class Iter, class T>
Iter binary_find(Iter begin, Iter end, T val)
{
    // Finds the lower bound in at most log(last - first) + 1 comparisons
    Iter i = std::lower_bound(begin, end, val);

    if (i != end && !(val < *i))
        return i; // found
    else
        return end; // not found
}

Another solution would be to use a std::set, which guarantees the ordering of the elements and provides a method iterator find(T key) that returns an iterator to the given item. However, your requirements might not be compatible with the use of a set (for example if you need to store the same element multiple times).

Luc Touraille
  • 79,925
  • 15
  • 92
  • 137
  • yes this works, and I have a similar implementation right now, however it's a "naive" implementation, in the sense that it's not making use of the situation's context, in this case sorted data. – Robert Gould Jan 15 '09 at 10:59
  • 8
    I don't really understand your comment, since lower_bound can only be used on sorted data. Complexity is lower than using find (see edit). – Luc Touraille Jan 15 '09 at 11:02
  • sorry I forgot they had to be sorted, so your implementation is correct, and it is taking advantage of the sorted data! – Robert Gould Jan 15 '09 at 11:05
  • 4
    To complement Luc's answer, check Matt Austern's classic article [Why You Shouldn't Use set, and What You Should Use Instead](http://lafstern.org/matt/col1.pdf) (C++ Report 12:4, April 2000) to understand why binary search with sorted vectors is usually preferable to std::set, which is a tree-based associative container. – ZunTzu Nov 03 '12 at 12:46
  • 18
    Don't use `*i == val`! Rather use `!(val < *i)`. The reason is that `lower_bound` uses `<`, not `==` (i.e. `T` is not even required to be equality-comparable). (See Scott Meyers' _Effective STL_ for an explanation of the difference between _equality_ and _equivalence_.) – gx_ Jul 09 '13 at 16:22
  • @gx if i want a comparison object, it would be !comp(val,*i), right? – user877329 Nov 23 '14 at 18:48
  • In case the val you are searching for is located at the index end, although the function returns the correct value, the comment "not found" becomes incorrect. – Can Kavaklıoğlu Nov 03 '15 at 20:24
  • 2
    @CanKavaklıoğlu There is no element located at `end`. Ranges in the C++ standard library are represented with half-open intervals: the end iterator "points" *after* the last element. As such, it can be returned by algorithms to indicate that no value was found. – Luc Touraille Nov 03 '15 at 21:22
  • I would make the last param `T& val` or `const T& val`. Is in line with the std call, and it saves a copy of course. – Halfgaar Apr 08 '16 at 13:45
  • @Halfgaar You could even make the last param a `boost::call_traits::param_type` to avoid the reference for small built-in types. – Luc Touraille Apr 08 '16 at 15:26
  • @LucTouraille Will not it return the location which is not existing if not found. If we have 5 elements, `return end` will give 5th location, which is not existing – Agaz Wani May 12 '16 at 06:19
  • @AaghazHussain This is how the standard library algorithms denotes the fact that no element was found, by returning an iterator to the end of the range. To know if the element was found by the `binary_find` function, you must compare the result to end: `auto it = binary_find(v.begin(), v.end(), value); if (it != v.end()) { // value found, accessible at *it }` – Luc Touraille May 12 '16 at 07:10
  • Just a comment, by using this function, you will need to have one more compare after lower_bound. If compare is expensive, probably better idea is to do your own binary search? – Nick Aug 12 '16 at 20:38
  • This should so be in the standard library! – Apollys supports Monica Feb 28 '19 at 00:54
10

You should have a look at std::equal_range. It will return a pair of iterators to the range of all results.

Tim Sylvester
  • 22,897
  • 2
  • 80
  • 94
Luc Hermitte
  • 31,979
  • 7
  • 69
  • 83
  • According to http://www.cplusplus.com/reference/algorithm/equal_range/ the cost of std::equal_range is approximately twice as high as std::lower_bound. It appears that it wraps a call to std::lower_bound and a call to std::upper_bound. If you know your data doesn't have duplicates then that is overkill and std::lower_bound (as demonstrated in the top answer) is the best choice. – Bruce Dawson Jun 22 '15 at 23:40
  • @BruceDawson: cplusplus.com only gives a *reference* implementation to specify the **behavior**; for an actual implementation you can check your favorite standard library. For example, in https://llvm.org/svn/llvm-project/libcxx/trunk/include/algorithm we can see that the calls to lower_bound and upper_bound are made on disjoint intervals (after some manual binary search). That being said, it is likely to be more expensive, especially on ranges with multiple values matching. – Matthieu M. Sep 27 '17 at 14:58
6

There is a set of them:

http://www.sgi.com/tech/stl/table_of_contents.html

Search for:

On a separate note:

They were probably thinking that searching containers could term up more than one result. But on the odd occasion where you just need to test for existence an optimized version would also be nice.

Martin York
  • 257,169
  • 86
  • 333
  • 562
  • 4
    binary_search doesn't return an iterator as I mentioned earlier, that's why I'm looking for an alternative. – Robert Gould Jan 15 '09 at 10:43
  • 1
    Yes, I know. But it fits in the set of binary search algorithms. So its nice for others to know about. – Martin York Jan 15 '09 at 10:44
  • 10
    binary_search is just, like so many other things in the STL, named wrong. I hate that. Testing for existence is not the same as searching for something. – OregonGhost Jan 15 '09 at 10:53
  • 2
    These binary search functions are not useful in case where you want to know the index of the element you are looking for. I have to write my own recursive function for this task. I hope this, template int bindary_search(const T& item), should be added to the next version of C++. – Kemin Zhou Jul 28 '16 at 22:52
3

If std::lower_bound is too low-level for your liking, you might want to check boost::container::flat_multiset. It is a drop-in replacement for std::multiset implemented as a sorted vector using binary search.

ZunTzu
  • 7,244
  • 3
  • 31
  • 39
  • 1
    Good link; and also good link *in* the link: http://lafstern.org/matt/col1.pdf, which describes how lookups implemented with a sorted vector, rather than set (though both are log(N)), have *significantly* better constants of proportionality and are ~twice as fast (the disadvantage being a larger INSERTION time). – Dan Nissenbaum Apr 14 '13 at 20:00
3

The shortest implementation, wondering why it's not included in the standard library:

template<class ForwardIt, class T, class Compare=std::less<>>
ForwardIt binary_find(ForwardIt first, ForwardIt last, const T& value, Compare comp={})
{
    // Note: BOTH type T and the type after ForwardIt is dereferenced 
    // must be implicitly convertible to BOTH Type1 and Type2, used in Compare. 
    // This is stricter than lower_bound requirement (see above)

    first = std::lower_bound(first, last, value, comp);
    return first != last && !comp(value, *first) ? first : last;
}

From https://en.cppreference.com/w/cpp/algorithm/lower_bound

trozen
  • 1,117
  • 13
  • 13
  • I can think of two reasons this is not in the standard library: They think it is easy to implement, but the major reason is probably that it may require a reversed version of operator()() if value is not interchangeable with *first. – user877329 May 06 '20 at 19:55
2
int BinarySearch(vector<int> array,int var)
{ 
    //array should be sorted in ascending order in this case  
    int start=0;
    int end=array.size()-1;
    while(start<=end){
        int mid=(start+end)/2;
        if(array[mid]==var){
            return mid;
        }
        else if(var<array[mid]){
            end=mid-1;
        }
        else{
            start=mid+1;
        }
    }
    return 0;
}

Example: Consider an array, A=[1,2,3,4,5,6,7,8,9] Suppose you want to search the index of 3 Initially, start=0 and end=9-1=8 Now, since start<=end; mid=4; (array[mid] which is 5) !=3 Now, 3 lies to the left of mid as its smaller than 5. Therefore, we only search the left part of the array Hence, now start=0 and end=3; mid=2.Since array[mid]==3, hence we got the number we were searching for. Hence, we return its index which is equal to mid.

  • 1
    It's good to have code, but you could improve the answer by providing a brief explanation of how it works for people who are new to the language. – Taegost Feb 12 '18 at 20:17
  • Somebody incorrectly [flagged your post as low-quality](https://stackoverflow.com/review/low-quality-posts/18801852). A [code-only answer is not low-quality](https://meta.stackoverflow.com/a/358757/1364007). Does it attempt to answer the question? If not, flag as 'not an answer' or recommend deletion (if in the review queue). b) Is it technically incorrect? Downvote or comment. – Wai Ha Lee Feb 12 '18 at 23:13
  • Not sure about how this compares in terms of actual runtime against the std:lower_bound solution, but personally I much prefer this nice and readable solution. It is a clean and simple implementation of binary search returning an index. No idea why this is so poorly rated!! And yes, it's not a template and yes, it doesn't return an iterator - but shall we have a poll how many people do binary search on anything other than a vector of ints? – means-to-meaning Oct 29 '20 at 22:48
1

Check this function, qBinaryFind:

RandomAccessIterator qBinaryFind ( RandomAccessIterator begin, RandomAccessIterator end, const T & value )

Performs a binary search of the range [begin, end) and returns the position of an occurrence of value. If there are no occurrences of value, returns end.

The items in the range [begin, end) must be sorted in ascending order; see qSort().

If there are many occurrences of the same value, any one of them could be returned. Use qLowerBound() or qUpperBound() if you need finer control.

Example:

QVector<int> vect;
 vect << 3 << 3 << 6 << 6 << 6 << 8;

 QVector<int>::iterator i =
         qBinaryFind(vect.begin(), vect.end(), 6);
 // i == vect.begin() + 2 (or 3 or 4)

The function is included in the <QtAlgorithms> header which is a part of the Qt library.

ulidtko
  • 14,740
  • 10
  • 56
  • 88
Lawand
  • 1,094
  • 2
  • 16
  • 40
0

A solution returning the position inside the range could be like this, using only operations on iterators (it should work even if iterator does not arithmetic):

template <class InputIterator, typename T>
size_t BinarySearchPos(InputIterator first, InputIterator last, const T& val)
{       
    const InputIterator beginIt = first;
    InputIterator element = first;
    size_t p = 0;
    size_t shift = 0;
    while((first <= last)) 
    {
        p = std::distance(beginIt, first);
        size_t u = std::distance(beginIt, last);
        size_t m = p + (u-p)/2;  // overflow safe (p+u)/2
        std::advance(element, m - shift);
        shift = m;
        if(*element == val) 
            return m; // value found at position  m
        if(val > *element)
            first = element++;
        else
            last  = element--;

    }
    // if you are here the value is not present in the list, 
    // however if there are the value should be at position u
    // (here p==u)
    return p;

}
lbonn
  • 2,499
  • 22
  • 32
0

std::lower_bound() :)

moogs
  • 8,122
  • 8
  • 44
  • 60