0

I am not sure how to form this question, since english isn't my primary language. . .

What is the process behind comparing two strings?

For example, how does a computer, on which logic, compare two strings?

#include <iostream>
#include <string>

int main()
{
    std::string s1 {"b"};
    std::string s2 {"abc"};

    if(s1 > s2)
    {
        std::cout << s1 << " > " << s2;
    }
    else std::cout << s2 << " > " << s1;

    return 0;
}

ouput: b > abc

How does a computer come up with this logic (even though it is correct).

I imagined computers logic to be converting chars into integers then comparing them by size, which is not the case here since if it was

b > abc would be treated like 98 > 97 + 98 + 99 which is incorrect.

  • 2
    `98 > 97`, that's it. Why would you add up the ASCII codes of the letters anyway? – ForceBru Mar 25 '17 at 18:28
  • 1
    The computer doesn't need to convert characters into integers - characters are already integer values – UnholySheep Mar 25 '17 at 18:28
  • It compares the first pair of different characters. One that is larger belongs to a "larger" string. – HolyBlackCat Mar 25 '17 at 18:29
  • On a side note, your `else` branch is wrong. If s1 is not greater than s2, it may also be that s1 is equal to s2. – zett42 Mar 25 '17 at 18:31
  • When you say "a computer", this sounds like the CPU itself performed the comparison. That's a wrong picture. The lexicographical comparison performed for `std::string` is an operation on a much higher abstraction level. – Christian Hackl Mar 25 '17 at 18:31
  • Thanks for the answers guys. @zett42 I see now that my else branch is wrong. That doesn't interfere with the question though... Also @Christian Hackl my english vocabulary isn't very good. I meant more like compiler, or whatever does the comparison :) @ForceBru I just tested if I put one string as `b` and one as `bcd` and replaced the whole if with `if(s1 == s2) std::cout << "s1 = s2";` it wouldn't enter it. So it doesn't compare only first letters. – ihatestrings Mar 25 '17 at 18:43
  • [Here is a nice answer](http://stackoverflow.com/a/21106815/7571258) about how a string comparison might look like at the low level (assembly language), which is "what the computer comes up with". – zett42 Mar 25 '17 at 19:05

1 Answers1

3

The comparison logic is specified by the string's char traits, which for std::string is std::char_traits<char>::compare, which in turn specifies "lexicographic comparison". Each character is compared based on its numeric value, which is given by the encoding of the execution character set. On your platform, 'b' > 'a' is true, so s2 compares less than s1.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • Where does the reference state anything about encoding and character sets being involved in string comparison? – zett42 Mar 25 '17 at 18:35
  • 2
    @zett42 it doesnt need to because the comparison only cares about the numerical value – 463035818_is_not_an_ai Mar 25 '17 at 18:42
  • Just checked, the `std::char_traits::compare()` implementation of MSVC++2017 does a simple `memcmp()`, so it doesn't really care about encodings and character sets. – zett42 Mar 25 '17 at 18:56
  • @zett42: I think you're misunderstanding the point about encodings. The result of `'a' < 'b'` depends on the numeric values of `'a'` and `'b'`, which is precisely what the "encoding" of the execution character set is. If `'a'` is 10 and `'b'` is 20, the comparison is true, and if `'a'` is 10 and `'b'` is 5, the comparison is false. – Kerrek SB Mar 25 '17 at 19:06
  • I just had to look up "execution character set", MSVC has a [pretty good explanation](https://msdn.microsoft.com/en-us/library/mt708818.aspx): _The execution character set is the encoding used for the text of your program that is input to the compilation phase after all preprocessing steps._ ... still need to find a character set where `'a' < 'b' == false` though ;-) – zett42 Mar 25 '17 at 19:30
  • @zett42: I mean, `'a'` and `'b'` are a red herring, of course. But sooner or later the OP will have strings with mixed case, or with things like `'-'` or `'_'`... – Kerrek SB Mar 25 '17 at 19:46