4

I am trying to sort a vector of strings using a custom comparator that sorts first by a the string portion then by the numeric portion. However when I try to run the code at the bottom I get the error

terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 2) > this->size() (which is 0)

With some debugging I find that this is a result of my cmp function receiving an empty string. When all the elements of my vector are not empty, why would my cmp function receive an empty string?

#include <string>
#include <vector>
#include <algorithm>

using namespace std;

string get_letter(const string &s) {
    return s.substr(0, 1);
}

unsigned long get_number(const string &s) {
    return stoi(s.substr(2));
}

bool cmp(const string &a, const string &b) {
    bool letter_cmp = get_letter(a) < get_letter(b);
    bool number_cmp = get_number(a) < get_number(b);
    return letter_cmp || number_cmp;
}

int main()
{
    vector<string> vals;
    vals.push_back("X   2805");
    vals.push_back("W   279");
    vals.push_back("O   8667");
    vals.push_back("U   7133");
    vals.push_back("B   654");
    vals.push_back("S   7216");
    vals.push_back("V   6016");
    vals.push_back("H   965");
    vals.push_back("Z   4821");
    vals.push_back("O   4299");
    vals.push_back("Y   3704");
    vals.push_back("U   6526");
    vals.push_back("R   3942");
    vals.push_back("S   3250");
    vals.push_back("E   4319");
    vals.push_back("K   7847");
    vals.push_back("Y   2607");
    vals.push_back("D   8508");
    vals.push_back("F   2846");
    vals.push_back("Z   4069");
    vals.push_back("J   8086");
    vals.push_back("M   2800");
    vals.push_back("B   5983");
    vals.push_back("P   9819");
    vals.push_back("L   1886");
    vals.push_back("C   9269");
    vals.push_back("W   3862");
    vals.push_back("M   9940");
    vals.push_back("B   2236");
    vals.push_back("U   1748");
    vals.push_back("O   7226");
    vals.push_back("H   5304");
    vals.push_back("C   2785");
    vals.push_back("V   2873");
    vals.push_back("V   9744");
    vals.push_back("D   2562");
    vals.push_back("Q   8987");
    vals.push_back("W   8557");
    vals.push_back("J   2045");
    vals.push_back("G   1875");
    vals.push_back("O   627");
    vals.push_back("T   1173");
    vals.push_back("T   1371");
    vals.push_back("V   8724");
    vals.push_back("T   8625");
    vals.push_back("L   2434");
    vals.push_back("H   3669");
    vals.push_back("J   8253");
    vals.push_back("Z   7284");
    vals.push_back("H   9764");
    vals.push_back("G   8312");
    vals.push_back("V   3186");
    vals.push_back("V   7479");
    vals.push_back("J   5610");
    vals.push_back("Q   421");
    vals.push_back("Z   6078");
    vals.push_back("P   3644");
    vals.push_back("C   3690");
    vals.push_back("F   1386");
    vals.push_back("R   7987");

    sort(vals.begin(), vals.end(), cmp);
}
shians
  • 955
  • 1
  • 6
  • 21
  • 2
    I don't know if it has anything to do with empty strings, but your comparison is broken. It says `X 2805` is less than `O 8667` while also saying `O 8667` is less than `X 2805`. – chris Jun 19 '20 at 02:18
  • 3
    `cmp` is not a valid comparison function. There are many inputs for which ’cmp(a,b)’ is true and ’cmp(b,a)` is true. Think of ’cmp’ as a generalized less-than — it can’t be true in both directions. – Pete Becker Jun 19 '20 at 02:18
  • Additionally, you could just replace your get_letter function with str[0], as it only needs the first letter of the string. – Telescope Jun 19 '20 at 02:19
  • @chris — exactly right: the behavior is undefined. Make that an answer. – Pete Becker Jun 19 '20 at 02:20
  • Also, can't you just use the normal string comparison function for your sorting purpose? – Telescope Jun 19 '20 at 02:22
  • @PeteBecker, I know it _could_ produce the poster's behaviour, but I'm not totally confident that it does without some time for reproduction and testing. It was just something that stuck out like a sore thumb because a comparison based on a disjunction is really abnormal. – chris Jun 19 '20 at 02:26
  • @chris - undefined behavior is just that; *undefined*. Once you have UB, trying to reason about "what could happen" is a mug's game. It doesn't have to make sense. It could change from run to run. – Marshall Clow Jun 19 '20 at 02:31
  • The comparison function is required to implement strict weak ordering. `cmp` does not. This is undefined behavior. See the linked question for more information; specifically the second answer. – Sam Varshavchik Jun 19 '20 at 02:31
  • @chris — the answer to “why does `std::sort` act funny” is always “because the comparison function isn’t valid”. – Pete Becker Jun 19 '20 at 02:35
  • @MarshallClow, What I mean is that I wasn't sure whether fixing that actually solves this problem or whether it's just another bug that should be fixed in addition. I can see how an empty string could be produced from a bad sort comparison, but I'm not as confident in linking those as I am with, say, `++i * ++i` producing "unexpected" output. To increase confidence, I'd first reproduce the issue, make the fix, and then see whether the issue goes away. However, given the lack of anything else found, I'm inclined to say it was the only bug. – chris Jun 19 '20 at 02:56
  • I find it interesting how many people hide behind UB. Make no mistake, the behavior is undefined. But there is still a concrete machine somewhere execution instructions causing these effects. Chandler Karruth has a great talk on this, "Garbage In, Garbage Out: Arguing about Undefined Behavior". It can be enlightening to learn just exactly how causing UB in the comparator can eventually lead to empty strings. That said, a debugger is the best tool for this and I encourage OP to attach one and take a look. – GManNickG Jun 19 '20 at 03:30
  • Yep, I made a logic error, my intention is really to have `letter_cmp && number_cmp`, I guess any goes when you're in UB land. I don't really know how I'd get to the bottom of this, I've debugged at my comparator function to discover the empty string issue, but to really get to the bottom of it I'd have to debug into the implementation of `sort()` which is too much for me. – shians Jun 19 '20 at 03:53
  • 1
    @GManNickG - it might be enlightening to learn just how UB in the comparator can lead to empty strings, but that's not really _useful knowledge_. It would only apply to one release of one standard library - a different implementation might/would give different behavior. Same with different versions of the same library. Or different compiler flags (`-O1` vs. `-O3`). – Marshall Clow Jun 19 '20 at 05:18
  • @MarshallClow: I disagree. It wouldn't be actionable for this specific case, in the sense that one wouldn't try to reason about it and work around it. But it would be a good lesson in using a debugger, understanding how the stdlib in this case decided to implement something, and generally building expertise on systems-level understanding of machines. – GManNickG Jun 19 '20 at 05:32
  • @GManNickG, that's not healthy – Hrisip Jun 19 '20 at 06:37
  • @Adler It's not...healthy?... to learn how to use a debugger? Questionable take. But, what do I know about systems programming or C++? :) – GManNickG Jun 19 '20 at 14:06
  • @GManNickG, yeah, instead of debugging, it's better to figure out how to avoid debugging. Anyway, let's not continue this discussion. – Hrisip Jun 19 '20 at 16:31

0 Answers0