6

So far, I have been storing the array in a vector and then looping through the vector to find the matching element and then returning the index.

Is there a faster way to do this in C++? The STL structure I use to store the array doesn't really matter to me (it doesn't have to be a vector). My array is also unique (no repeating elements) and ordered (e.g. a list of dates going forward in time).

James McNellis
  • 348,265
  • 75
  • 913
  • 977
user788171
  • 16,753
  • 40
  • 98
  • 125

2 Answers2

7

Since the elements are sorted, you can use a binary search to find the matching element. The C++ Standard Library has a std::lower_bound algorithm that can be used for this purpose. I would recommend wrapping it in your own binary search algorithm, for clarity and simplicity:

/// Performs a binary search for an element
///
/// The range `[first, last)` must be ordered via `comparer`.  If `value` is
/// found in the range, an iterator to the first element comparing equal to
/// `value` will be returned; if `value` is not found in the range, `last` is
/// returned.
template <typename RandomAccessIterator, typename Value, typename Comparer>
auto binary_search(RandomAccessIterator const  first,
                   RandomAccessIterator const  last,
                   Value                const& value,
                   Comparer                    comparer) -> RandomAccessIterator
{
    RandomAccessIterator it(std::lower_bound(first, last, value, comparer));
    if (it == last || comparer(*it, value) || comparer(value, *it))
        return last;

    return it;
}

(The C++ Standard Library has a std::binary_search, but it returns a bool: true if the range contains the element, false otherwise. It's not useful if you want an iterator to the element.)

Once you have an iterator to the element, you can use std::distance algorithm to compute the index of the element in the range.

Both of these algorithms work equally well any random access sequence, including both std::vector and ordinary arrays.

James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • @Ulterior: Yes, it's copy-pasta'ed from my CxxReflect library. See [algorithm.hpp](http://cxxreflect.codeplex.com/SourceControl/changeset/view/8ffbb562ad38#cxxreflect%2fcore%2falgorithm.hpp). – James McNellis Jul 07 '12 at 22:12
  • @Ulterior you would need a compiler with some C++11 support, although the algorithm can be easily written in C++03. – juanchopanza Jul 07 '12 at 22:27
  • Thanks, I am still in C99 ATM – Ulterior Jul 07 '12 at 22:30
  • @Ulterior: This isn't C at all, it's C++. – James McNellis Jul 07 '12 at 22:31
  • @JamesMcNellis I know C++ to some extent, thanks, I just found it weird in a way – Ulterior Jul 07 '12 at 22:32
  • Hi James, can you provide a bit more information to help me better understand how your solution works? For instance, what is Comparer and where do I pass in my vector of dates? – user788171 Jul 07 '12 at 22:48
  • Try as I may, I am unable to get this solution to work so I can't upvote this or mark it as correct even though this may be the correct approach. Specifics of why I can't get this to work for me are posted here: http://stackoverflow.com/questions/11380521/c-unresolved-overloaded-function-type-with-comparison-function – user788171 Jul 08 '12 at 03:51
  • @user788171: I am glad the answer to that other question was useful to you, and I am sorry mine was less helpful than it was intended to be. It's often hard to tell what level of explanation someone is looking for. But, I'm glad you got it working! – James McNellis Jul 08 '12 at 05:22
7

If you want to associate a value with an index and find the index quickly you can use std::map or std::unordered_map. You can also combine these with other data structures (e.g. a std::list or std::vector) depending on the other operations you want to perform on the data.

For example, when creating the vector we also create a lookup table:

vector<int> test(test_size);
unordered_map<int, size_t> lookup;
int value = 0;
for(size_t index = 0; index < test_size; ++index)
{
    test[index] = value;
    lookup[value] = index;
    value += rand()%100+1;
}

Now to look up the index you simply:

size_t index = lookup[find_value];

Using a hash table based data structure (e.g. the unordered_map) is a fairly classical space/time tradeoff and can outperform doing a binary search for this sort of "reverse" lookup operation when you need to do a lot of lookups. The other advantage is that it also works when the vector is unsorted.

For fun :-) I've done a quick benchmark in VS2012RC comparing James' binary search code with a linear search and with using unordered_map for lookup, all on a vector: Performance of various find index methods

To ~50000 elements unordered_set significantly (x3-4) outpeforms the binary search which is exhibiting the expected O(log N) behavior, the somewhat surprising result is that unordered_map loses it's O(1) behavior past 10000 elements, presumably due to hash collisions, perhaps an implementation issue.

EDIT: max_load_factor() for the unordered map is 1 so there should be no collisions. The difference in performance between the binary search and the hash table for very large vectors appears to be caching related and varies depending on the lookup pattern in the benchmark.

Choosing between std::map and std::unordered_map talks about the difference between ordered and unordered maps.

Community
  • 1
  • 1
Guy Sirton
  • 8,331
  • 2
  • 26
  • 36