0

In my code I use names of people. For example one of them is:

const char *translators[] = {"Jörgen Adam <adam@***.de>", NULL};

and contain ö 'LATIN SMALL LETTER O WITH DIAERESIS'

When I write code what format is right to use

UTF-8:

Jörgen Adam

or

UTF-8(hex):

J\xc3\xb6rgen Adam

UPDATE:

Text with name will be print in GTK About Dialog (name of translators)

user1935430
  • 243
  • 6
  • 13
  • This is from a C++ question and so not a duplicate: http://stackoverflow.com/questions/5676978/unicode-identifiers-and-source-code-in-c11 but the checked answer by bames53 there is applicable to C99 too. – CodeClown42 Sep 14 '13 at 14:22
  • What toolkit are you using for displaying the about dialog? It seems to me that this is the responsibility of the toolkit to interpret UTF8 correctly (with win32 API, you would have to convert to UTF16 first). – Alexandre C. Sep 14 '13 at 14:44

1 Answers1

2

The answer depends a lot on whether this is in a comment or a string.

If it's in a comment, there's no question: you should use raw UTF-8, so it should appear as:

/* Jörgen Adam */

If the user reading the file has a misconfigured/legacy system that treats text as something other than UTF-8, it will appear in some other way, but this is just a comment so it won't affect code generation, and the ugliness is their problem.

If on the other hand the UTF-8 is in a string, you probably want the code to be interpreted correctly even if the compile-time character set is not UTF-8. In that case, your safest bet is probably to use:

"J\xc3\xb6rgen Adam"

It might actually be safe to use the UTF-8 literal there too; I'm not 100% clear on C's specification of the handling of non-wide string literals and compile-time character set. Unless you can convince yourself that it's formally safe and not broken on a compiler you care to support, though, I would just stick with the hex.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Text with name will be print in About Dialog (name of translators) – user1935430 Sep 14 '13 at 14:38
  • For more than you probably wanted to know about the character encoding(s) of C source code (as of C99; I have not checked, but I expect C11 didn't change much), see the long comment beginning at http://gcc.gnu.org/viewcvs/gcc/trunk/libcpp/charset.c?view=markup#l25 . It is *not* safe in principle to write raw UTF-8 in narrow string literals, although I would expect most non-Windows compilers to DTRT nowadays. – zwol Sep 14 '13 at 14:51