3

I'm trying to come up with a lambda that will allow std::equal_range to return a range in which the string searched for exists as a prefix. As that probably isn't worded right, an example:

Given the vector of strings:

  • C:\users\andy\documents\screenshot.jpg
  • C:\users\bob\desktop\file.txt
  • C:\users\bob\desktop\picture.png
  • C:\users\bob\desktop\video.mp4
  • C:\users\john\desktop\note.txt

I would expect the iterators returned to be

  • C:\users\bob\desktop\file.txt and
  • C:\users\bob\desktop\video.mp4.

How would I write a comparison lambda for std::equal_range that accomplishes this, or is std::equal_range not the right tool for this job?

ks1322
  • 33,961
  • 14
  • 109
  • 164
  • A substring or a prefix? a substring could occur anywhere in the string and the results need not be contiguous. – Retired Ninja Dec 28 '17 at 00:17
  • I guess I would almost always be searching for a prefix, good catch. – Andrew LeFevre Dec 28 '17 at 00:32
  • 3
    The four-argument overload of `std::equal_range` with the comparison callable object, seems to be sufficient for the task. Just code a comparison object that implements a comparison that matches the prefix. You would have to carefully code the comparison object with [separate implementations of `comp(element, value)` and `comp(value, element)`](http://en.cppreference.com/w/cpp/algorithm/equal_range), in order to return the correct comparison result. – Sam Varshavchik Dec 28 '17 at 00:45
  • Well, start with the basics. Given a function that takes a full string, and a prefix, as a parameter, return a `bool` value indicating whether the full string collates before the prefix. That's laughably trivial, and, once written, you're 50% done, already. The other 50% is a function that takes a prefix and a full string, and returns `true` if the prefix collates before the full string. Problem solved. – Sam Varshavchik Dec 28 '17 at 00:50
  • 1
    This seems to work but I'm not sure it's exactly right: https://ideone.com/NiV02v You could probably avoid the `substr` using `std::string.compare` but this was clearer to me while looking at it. *shrug* – Retired Ninja Dec 28 '17 at 01:13
  • 1
    I would take a slightly more complicated approach than Retired Ninja's. I would coerce the prefix string into a distinct, helper type that's not a `std::string`. Then I can discriminate between the two types of comparison functions, and implement two `operator()`s in the custom comparator class, comparing prefix to string, and prefix to string. – Sam Varshavchik Dec 28 '17 at 01:16
  • Yeah, this was just a "can I get it working at all" kinda thing. Two different functions makes it easier to see how they would be used with lower_bound/upper_bound. – Retired Ninja Dec 28 '17 at 01:20
  • @RetiredNinja that seems to work perfectly! I'm just confused as to why comparing by < in the if statements works, and why comparing by <= isn't necessary? – Andrew LeFevre Dec 28 '17 at 01:24
  • 1
    what about `C:\users\bob\desktop\picture.png` why it should not be returned? – Killzone Kid Dec 28 '17 at 01:28
  • 2
    Using `<=` or `>=` breaks the strict weak ordering requirement for comparison. This answer explains it better than I could. :) https://stackoverflow.com/a/981299/920069 – Retired Ninja Dec 28 '17 at 01:28
  • @KillzoneKid std::equal_range returns a range, the first element that matches and the last element that matches. So C:\users\bob\desktop\picture.png is in the range between C:\users\bob\desktop\file.txt and C:\users\bob\desktop\video.mp4. – Andrew LeFevre Dec 28 '17 at 18:47
  • @AndrewLeFevre Gotcha! – Killzone Kid Dec 28 '17 at 18:51
  • @SamVarshavchik What is the benefit of your more complicated approach? Would it improve performance at all? Just wondering if your approach is objectively better in some way. – Andrew LeFevre Dec 28 '17 at 19:38
  • It avoids the need for the comparator to explicitly capture the prefix "out of band" in order to figure out which parameter value is the prefix. Probably marginally faster, no need to do that comparison on every call; but I doubt that it'll make any difference on modern multighz CPUs. – Sam Varshavchik Dec 28 '17 at 23:05
  • @SamVarshavchik oh OK thanks! – Andrew LeFevre Dec 29 '17 at 15:33

1 Answers1

2

I think you simply need to make the comparator compare only the length of the prefix against the elements like this:

std::vector<std::string> v
{
    "C:/users/andy/documents/screenshot.jpg",
    "C:/users/bob/desktop/file.txt",
    "C:/users/bob/desktop/picture.png",
    "C:/users/bob/desktop/video.mp4",
    "C:/users/john/desktop/note.txt",
};

std::sort(std::begin(v), std::end(v));

std::string const prefix = "C:/users/bob/desktop/";

auto lb = std::lower_bound(std::begin(v), std::end(v), prefix);

// For the upper bound we want to view the vector's data as if
// every element was truncated to the size of the prefix.
// Then perform a normal match.
auto ub = std::upper_bound(lb, std::end(v), prefix,
[&](std::string const& s1, std::string const& s2)
{
    // compare UP TO the length of the prefix and no farther
    if(auto cmp = std::strncmp(s1.data(), s2.data(), prefix.size()))
        return cmp < 0;

    // The strings are equal to the length of the prefix so
    // behave as if they are equal. That means s1 < s2 == false
    return false;
});

// make the answer look like we used std::equal_range
// (if that's what's needed)
auto range = std::make_pair(lb, ub);

for(auto itr = range.first; itr != range.second; ++itr)
    std::cout << *itr << '\n';

Output:

C:/users/bob/desktop/file.txt
C:/users/bob/desktop/picture.png
C:/users/bob/desktop/video.mp4

To explain why this works imagine taking the vector and sorting it. Then imagine visiting every element and truncating them to the length of the prefix. You would be left with a sorted vector with no elements longer than the prefix. At that point a simple std::equal_range would do what you require. Thus, all we need to do is construct a comparator that behaves as if the container elements had been truncated to the length of the prefix and use that comparator in our std::equal_range search (or twin std::lower_bound/upper_bound search).

Galik
  • 47,303
  • 4
  • 80
  • 117
  • From what I understand of Sam Varshavchik's comments above, you have to compare using both comp(element, value) and comp(value, element) right? Also, would using string::compare be faster here? – Andrew LeFevre Dec 29 '17 at 15:25
  • @AndrewLeFevre Yes, you are right. This is trickier than I thought. I'll give it some more consideration. – Galik Dec 29 '17 at 16:39
  • @AndrewLeFevre I believe this is now fixed. – Galik Dec 29 '17 at 17:18