3

I have a very large std::vector v of type std::vector<std::string> v. Now I want to compare which of the elements in the vector start with a certain substring str. What is the fastest possible way to do that?

I was thinking of a for-loop that iteratively compares the start of each element of v with the substring str. I first tried

std::string substring = "bla";
for (long unsigned int i = 0; i < v.size(); i++)
{
    if (!strncmp(v[i].c_str(), substring.c_str(), substring.size())) 
    {
        std::cout << "Item found: " << v[i] << std::endl;
    }
}

Which is mixed with and I am not happy with that.

What better alternatives are there?

JeJo
  • 30,635
  • 6
  • 49
  • 88
Gilfoyle
  • 3,282
  • 3
  • 47
  • 83
  • 2
    Simply do this: `if ( v[i].substr(0, substring.size()) == substring ) { /* ... */ }` for the string comparison. – DimChtz Aug 24 '19 at 17:05

3 Answers3

4

You can write completely a code.

If you want to find all the elements satisfying the condition, you can not avoid iterating through the entire vector. But you could use better range-based for-loop instead of index based loop to iterate through the vector, and check wether str.find(substring) == 0(credits @PiotrSkotnicki).

Here is the example code: (See online)

#include <iostream>
#include <string>
#include <vector>

int main()
{
    const std::string substring{ "bla" };
    std::vector<std::string> vecString{ {"bllll"}, {"bllll"}, {"blasomething"} };
    // iterate through the vector by range based for-loop
    // here `auto` deduded to `std::string` as you have vector of strings(i.e. `vecString`)
    for (const auto& str : vecString)
    {
        if (str.find(substring) == 0) {
            std::cout << str << " is a match\n";
            // do something more with str
        }
    }
    return 0;
}

Alternatively using std::for_each, along with a lambda function you could write the following. Read more about the lambdas here: What is a lambda expression in C++11? (See online)

#include <algorithm> // std::for_each

std::for_each(std::cbegin(vecString), std::cend(vecString), [&substring](const auto& str)
{
    if (str.find(substring) == 0)
    {
        std::cout << str << " is a match\n";
        // do something more with str
    }
});

If you are interested only the first match in the vector of string s, making use of standard algorithm std::find_if as follows

#include <algorithm> // std::find_if

const auto iter = std::find_if(std::cbegin(vecString), std::cend(vecString),
    [&substring](const auto& str) {
        return str.find(substring) == 0;
    }
);
if (iter != std::cend(vecString))
{
    // do something
}
JeJo
  • 30,635
  • 6
  • 49
  • 88
  • You still need to loop. This will only return the first match. – DimChtz Aug 24 '19 at 17:12
  • @DimChtz Missed the part of "all the elements". Okay, then simply range based loop or `std::for_each`. Will update accordingly – JeJo Aug 24 '19 at 17:14
  • Can you please add some comments to your code? I am not familiar with lines such as `for (const auto& str : vecString)` or `const auto check = [&substring](const auto& str)`. Thank you! – Gilfoyle Aug 24 '19 at 19:30
  • `std::basic_string::substr` creates a copy of the string which is inefficient in this case. `return str.find(substring) == 0;` would be far better. – Piotr Skotnicki Aug 25 '19 at 11:10
  • @JeJo That's why I verify that the substring was found at position `0`. See that `== 0` part ? – Piotr Skotnicki Aug 25 '19 at 11:41
  • Why is the `elementString.size() >= substring.size()` part essential. I tried it without and it works just fine. Can you explain why it is necessary? – Gilfoyle Aug 25 '19 at 11:51
  • @Samuel However, I have updated the answer with Piotr's suggestion, which is better than I have posted earlier. – JeJo Aug 25 '19 at 12:12
  • 1
    Do I need `str.find()` when I am only interested if the substring matches with the start of the string in the array? – Gilfoyle Aug 25 '19 at 12:24
  • @Samuel Yea, that is needed. Here we are checking whether the position of the *substring* is `0`. That means, at the beginning of the string. – JeJo Aug 25 '19 at 12:27
  • @JeJo @Samuel calling `str.find()` for this task is very inefficient unless optimizer is very smart. – ALX23z Aug 25 '19 at 15:51
  • @ALX23z *very inefficient*, one can not say directly that: Why?-> Without benchmarking the all other possible solutions to question(including the `str.find()` one) with certain test cases(possibly from OP) for several builds by maximum optimisation enabled. That has to be verified before you say `str.find()` is inefficient compared to other answers(IMHO). You are most welcome to point out(maybe as another answer) *the best efficient way* . – JeJo Aug 25 '19 at 16:11
  • @JeJo while compilers have grown quite good over the years, you surely leave a lot of shit behind for them to deal with. And what about debug mode? Do you forsake that mode as well? `string::find` is based on a complicated algorithm to avoid `O(NM)` complexity; don't expect much from compiler when dealing with sophisticated algorithms like that. Simplest and most efficient solution is `if(str.length()>= substring.length() && (str.compare(0,substring.length(),substring) == 0))`. – ALX23z Aug 25 '19 at 18:04
  • 1
    @ALX23z That looks promising: http://quick-bench.com/PuL3ggsdyADAUL47Al3SPKgHRog (hope I have done it correctly). I would suggest proving it as the answer so that future readers can benefit from it. I would be happy to up it. – JeJo Aug 25 '19 at 18:26
3

If you have an unsorted container you can't get better than O(n) in time complexity, which means iterating over the whole container in a linear manner (i.e. for loop). If your container was sorted (ex. std::set instead of std::vector) you would get O(log n) which is a lot better (binary search).

Prior to C++17, I can't come up with a better solution than yours (since creating a substring via std::string::substr means copying the substring unnecessarily). However C++17 introduced std::string_view which doesn't do any copying. There should be no noticable performance difference with compiler optimizations enabled.

std::vector<std::string> v { "abcd", "abcdefg", "aaaabbbb", "abc", "ab"};
std::string_view query = "abc";

for (auto const& str : v) 
{
    if (str.size() < query.size())
        continue;

    auto probe = std::string_view(str).substr(0, query.size());
    if (query == probe)
        std::cout << "Item found: " << str << "\n";        
}

Live example

And here is the std::set version for the quicker search:

std::set<std::string> v { "abcd", "abcdefg", "aaaabbbb", "abc", "ab"};
std::string query = "abc";

for (auto it = v.lower_bound(query); it != v.end(); ++it)
{
    auto probe = std::string_view(*it).substr(0, query.size());
    if (query == probe)
        std::cout << "Item found: " << *it << "\n";     
    else
        break;
}

Live example

Timo
  • 9,269
  • 2
  • 28
  • 58
3

You can using c++20 std::string_view::start_with:

std::vector<std::string> v = {...};
std::string_view prefix = "bla";
for (std::string_view sv : v)
    if (sv.starts_with(prefix))
        std::cout << "Item found: " << sv << std::endl;
康桓瑋
  • 33,481
  • 5
  • 40
  • 90