2

The task is to find a substring (needle) in another string (haystack), given the beginning position and end position of the "haystack". The the beginning and end positions follow STL convention, i.e. the end position is the position of the character following the interested range.

For example: find "567" with beg_pos=0 and end_pos=8 in "0123456789" should return 5, while find "567" with beg_pos=0 and end_pos=4 in "0123456789" should return -1.

I could imagine two simple implementations:

  • Method 1: Use size_t pos = haystack.find(needle, beg_pos); to get the substring position, then compare the return value pos with end_pos if found. In the worst case, the find function will go until the end of the string haystack, but the search after end_pos is unnecessary. The performance might be bad if haystack is long.
  • Method 2: Use size_t pos = haystack.substr(beg_pos, end_pos-beg_pos).find(needle); to find the position, then return pos+beg_pos if found. This method avoids the problem of unnecessary searching after end_pos, but it requires to allocate a new temporary string, which might also have performance issue.

I am wondering if there is a faster way to accomplish the task.

Yun Huang
  • 4,256
  • 7
  • 27
  • 36

2 Answers2

1

In C++17 we have std::string_view which can be constructed with a pointer and and size. This will allow you to get a read only slice of the string where nothing would be copied. You can then use std::string_view::find to find if the sub string exists in that slice. That would look like

std::string haystack = "lots of stuff";
std::string needle = "something";
std::string_view slice(haystack.c_str() + start, end - start); // use end - start to get size of the slice
auto pos = slice.find(needle);
if (pos == std::string::npos)
    return -1;
else
    return pos; // or pos + start if you need the index from the start and not just in the slice.
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • surely it should be this? `auto pos_slice = slice.find(needle) ; auto pos = (pos_slice == std::string::npos) ? pos_slice : pos_slice + start;` – Richard Hodges Sep 11 '17 at 16:51
  • @RichardHodges I'm not exactly sure what the OP needs in the end so I figured If I got the position it is in in the slice they can transform that to what they need. – NathanOliver Sep 11 '17 at 16:59
0

pre-c++17

Here is a method which I think is optimally quick. It uses std::search, which seems to me to be an iterator-based substr.

In this example the position of the needle is returned relative to the start of the haystack, not the substring being searched:

#include <string>
#include <iostream>
#include <algorithm>

int main()
{
    using namespace std::literals;

    auto my_haystack = "0123456789"s;

    auto needle = "567"s;
    auto find_needle = [&needle](auto first, auto last)
    {
        auto i = std::search(first, last, begin(needle), end(needle));
        if (i == last)
            return std::string::npos;
        else
            return std::string::size_type(std::distance(first, i));
    };

    auto in_substring = [](auto&& str, auto b, auto e, auto&& f) -> std::string::size_type
    {
        using std::begin;
        auto brange = begin(str) + b;
        auto erange = begin(str) + e;
        auto p = f(brange, erange);
        if (p != std::string::npos)
            p += b;
        return p;
    };

    auto pos = in_substring(my_haystack, 0, 4, find_needle);
    std::cout << pos << std::endl;

    pos = in_substring(my_haystack, 0, my_haystack.size(), find_needle);
    std::cout << pos << std::endl;

    pos = in_substring(my_haystack, 1, my_haystack.size(), find_needle);
    std::cout << pos << std::endl;

    pos = in_substring(my_haystack, 1, 4, find_needle);
    std::cout << pos << std::endl;
}

example output (64-bit size_type):

18446744073709551615
5
5
18446744073709551615
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142