6

I am trying to look for the position of vector elements into another vector. Here i am interested to use an implementation as fast as binary search. I have different vectors of length 1 million or more, so i am trying to achieve something faster.

Following situations in my case:

1) vector in which i am searching is sorted.

2) The element i am searching for will always be there i.e i don't have a case of not found, and i would like to get the index of vector elements in a faster way.

I tried the following code to get the index of vector elements.

#include <iostream>
#include <vector>
#include <algorithm>

template<class Iter, class T>
Iter binary_find(Iter begin, Iter end, T val)
{
    Iter i = std::lower_bound(begin, end, val);
    return i;
}

int main() {
    std::vector<std::string> values = {"AAAAAA","AB", "AD" ,"BCD","CD", "DD" };
    std::vector<std::string> tests = {"AB", "CD","AD", "DD"};
    for(int i=0 ; i < tests.size(); i++) {
        int pos = binary_find(values.begin(), values.end(), tests.at(i))- values.begin();
    std::cout << tests.at(i) << " found at: " << pos <<std::endl;
    }
    return 0;
}  

I would like to know if the code matches with the binary search implementation.??

Is there a faster way to get the index of vector elements?

Any further suggestions to improve this code.

Agaz Wani
  • 5,514
  • 8
  • 42
  • 62
  • 1
    If you find yourself doing a lot of performance-critical searches like this, you might want to consider an associative container of some kind. – TartanLlama May 12 '16 at 10:45

3 Answers3

4

binary_find doesn't return anything despite not declared to return void, so it has undefined behaviour.

After it is fixed, and assuming that you have no specific knowledge about the contents of the vector other than it is sorted, binary search is pretty much optimal.

There are however, other data structures that are faster for predicate based lookup than a vector. If performance is critical, you should take a look at search trees and hash maps. Since your keys are strings, tries and directed acyclic word graph in particular may be efficient. You may want to measure which is best for your use case.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • @AaghazHussain simple. Return something from the function. You wrote the function, so you should know best what you want to return. Perhaps you intended to return `i`? – eerorika May 12 '16 at 11:25
  • Do U mean `auto it = binary_find(values.begin(), values.end(), tests.at(i));` and then get the position by `it -values.begin();` – Agaz Wani May 12 '16 at 11:33
  • What is the problem if I do it in one line. – Agaz Wani May 12 '16 at 11:36
  • @AaghazHussain sorry, I was mistaken. Rest of the line was just hidden on my narrow screen. Single line is fine. – eerorika May 12 '16 at 13:27
4

http://www.cpluplus.com says that the behavior of binary_search is equivalent to:

template <class ForwardIterator, class T>
bool binary_search (ForwardIterator first, ForwardIterator last, const T& val) {
    first = std::lower_bound(first, last, val);
    return (first != last && !(val < *first));
}

So yes, lower_bound is your weapon of choice. But when you take the difference you should use distance. Cause, if there is a faster way to acquire the position it will be rolled into that function.

As far as other improvements, I'd suggest using C++14's begin and end and not calling a function which only serves to wrap a lower_bound (and fail to properly return a value.) So the way I'd write this code would look like:

auto pos = distance(lower_bound(begin(values), end(values), tests[i]), begin(values));
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
1

Q1: I would like to know if the code matches with the binary search implementation.??

Yes, it (almost) is. Check std::lower_bound, which states:

Complexity:

On average, logarithmic in the distance between first and last: Performs approximately log2(N)+1 element comparisons (where N is this distance). On non-random-access iterators, the iterator advances produce themselves an additional linear complexity in N on average.


Q2: Is there a faster way to get the index of vector elements.??

That's a rather broad question.


Q3: Any further suggestions to improve this code.

Hello world, Code Review!


PS - Did you even compile the code? It gives several messages, like:

warning: no return statement in function returning non-void [-Wreturn-type]

Compile with warnings enabled, like this:

g++ -Wall main.cpp
Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • Yes I did, I didn't get any `warning` on Linux terminal – Agaz Wani May 12 '16 at 10:57
  • 1
    @AaghazHussain you are welcome. That was a good question (in terms of interest), thus I upvoted you (like you did for my answer now). However, next time be more careful. Make sure you select one of the answers, so that the question appears as answered in the question feed. – gsamaras May 12 '16 at 11:05