5

How to initialize a const char* and/or const std::string in C++ with a sequence of UTF-8 characters?

I'm using a regular expression API that accepts UTF8 string as const char*. The initialization code should be platform independent.

Leonid
  • 22,360
  • 25
  • 67
  • 91
  • Available options depend on which compiler you are using. – Martin Ba Oct 07 '10 at 11:46
  • 4
    Easily. `const char* c = "ěščř";`. Just save the file in UTF-8 encoding. – nothrow Oct 07 '10 at 11:48
  • The options also depend on how readable the UTF-8 string should be in the source code. – Bart van Ingen Schenau Oct 07 '10 at 11:49
  • An arbitrary string provided at runtime, or a string that's known at compile time? If the former, how is it provided? As a special case if it's the latter, and if your string contains only ascii (7bit) characters, then UTF-8 is the same as ascii for those characters, so just use a string literal. `const char *utf8_string = "hello, world";`. Assuming your platform uses ascii as its basic encoding, of course. – Steve Jessop Oct 07 '10 at 12:22

2 Answers2

8

This should work with any compiler:

const char* twochars = "\xe6\x97\xa5\xd1\x88";
Nemanja Trifunovic
  • 24,346
  • 3
  • 50
  • 88
2

Compiler - independent answer is also: Save the file in UTF-8 without BOM signature encoding.

const char* c = "ěščř"; //Just save the file in UTF-8 without BOM signature.

(See the comment of question.)
Btw, Windows console must bee set to UTF8. For many details see post into question.

Community
  • 1
  • 1
vladasimovic
  • 310
  • 3
  • 5
  • This should be the recommended practice nowadays. Make it clear that all your source code is UTF-8 with no BOM, make no exceptions, always use UTF-8 for all your files, and then, initialise constant strings as the C/C++ standard way supports. – cesss Jan 05 '18 at 13:49