0xc2 character in std::string

Question

The following string has size 4 not 3 as I would have expected.

std::string s = "\r\n½"; 
int ss = s.size(); //ss is 4

When loop through the string character by character escaping it to hex I get

0x0D (hex code for carriage return)
0x0A (hex code for line feed)
0xc2 (hex code, but what is this?)
0xbd (hex code for the ½ character)

Where does the 0xc2 come from? Is it some sort of encoding information? I though std::string had a char per visible character in the string. Can someone confirm 0xc2 is a "character set modifier"?

Your file was encoded as UTF-8, and your compiler reads it as UTF-8, so ½ was converted to its UTF-8 binary representation which is 0xC2DB. C++ does not care about the original encoding, what it cares about is what's in memory when `s` is constructed, and what was in memory was 0D0AC2BD, so `s` was constructed with 4 bytes (4 `char` in your case). which is why you get this result. You would get different results if the file had been encoded with a different encoding, or if `char` had been 2 bytes long. — Holt, Jun 14 '18 at 12:46

YSC · Accepted Answer · 2018-06-14T12:58:19.637

11

"½" has, in unicode, the code point U+00BD and is represented by UTF-8 by the two bytes sequence 0xc2bd. This means, your string contains only three characters, but is four bytes long.

See https://www.fileformat.info/info/unicode/char/00bd/index.htm

Additional reading on SO: std::wstring VS std::string.

edited Jun 14 '18 at 12:58

answered Jun 14 '18 at 12:35

YSC

38,212
9
96
149

0xc2 character in std::string

1 Answers1