0

When creating a string variable (char*) without using any symbol of Unicode u8 etc, for example:

const char *str = "Hello 日本語 سلام Ä भारतीय ไทย";

How to specify the default encoding to read that variable in popular platforms or depend on what?

Lion King
  • 32,851
  • 25
  • 81
  • 143
  • `const char *` only store bytes and doesn't have any concept of encoding. [Here's](https://stackoverflow.com/q/45575863/6486738) a similar question, but it's not as easy as just setting an encoding. It's very platform dependent. It also depends on what you mean by _"read that variable in popular platforms"._ Do you mean print? Or something else? – Ted Klein Bergman Aug 17 '20 at 01:01
  • @TedKleinBergman: I mean when seeing the hexadecimal of encoding (when debugging) I found they are not familiar to me or they are not Unicode hexadecimal. – Lion King Aug 17 '20 at 01:12

1 Answers1

0

What is the default encoding that used with string variables without using the Unicode symbol?

There are two encodings to consider here. One is the encoding of the source file. This is the encoding which you have used to write the file. The compiler has to interpret the source using the same encoding that you used to write the file in order for that interpretation to be correct.

The other encoding is the execution encoding. This is the encoding that the string and character literals will have. It is often the same as the source encoding but if it isn't, then the literals will be converted to the execution encoding.

Both encodings are implementation defined.

How to specify the default encoding to read that variable in popular platforms or depend on what?

It depends on the compiler that you use.

For example, this is what the documentation of GCC says:

-fexec-charset=charset

Set the execution character set, used for string and character constants. The default is UTF-8 . charset can be any encoding supported by the system's "iconv" library routine.

-finput-charset=charset

Set the input character set, used for translation from the character set of the input file to the source character set used by GCC . If the locale does not specify, or GCC cannot get this information from the locale, the default is UTF-8 . This can be overridden by either the locale or this command line option. Currently the command line option takes precedence if there's a conflict. charset can be any encoding supported by the system's "iconv" library routine.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Firstly, I am using VS2019. In my case the source file is UTF-8 and when debugging the previous code to know what is the used encoding for that string but I found the hexadecimal of string is not familiar and is not Unicode too, so, I don't know what is the default encoding that used. – Lion King Aug 17 '20 at 01:54
  • 1
    @LionKing I suggest consulting the documentation of your compiler to find out how to find out or specify the execution encoding. – eerorika Aug 17 '20 at 01:57
  • 1
    I found [this page](https://learn.microsoft.com/en-us/cpp/build/reference/execution-charset-set-execution-character-set?view=vs-2019) for VS2019 talking about that subject, this compiler option `/execution-charset:utf-8` will change the execution encoding to UTF-8. But unfortunately, this is not by default. – Lion King Aug 17 '20 at 02:19