Gettext failed to extract non-ASCII characters

Question

In my source files I have string containing non-ASCII characters like

sCursorFormat = TRANSLATE("Frequency (Hz): %s\nDegree (°): %s");

But when I extract them they vanish like

msgid ""
"Frequency (Hz): %s\n"
"Degree (): %s"
msgstr ""

I have specified the encoding when extracting as

xgettext --from-code=UTF-8

I'm running under MS Windows and the source files are C++ (not that it should matter).

You are a member here for more than 9 years. What coding language is this? How can we reproduce the problem on our machines? This question is not well defined. — Dialecticus, Mar 22 '22 at 10:42

score 1 · Answer 1 · answered Mar 22 '22 at 12:13

The encoding of your source file is probably not UTF-8, but ANSI, which stands for whatever the encoding for non-Unicode applications is (probably code page 1252). If you would open the file in some hex editor you would see byte 0x80 standing for degree symbol. This byte is not a valid UTF-8 character. In UTF-8 encoding degree symbol is represented with two bytes 0xC2 0xB0. This is why the byte vanishes when using --from-code=UTF-8.

The solution for your problem is to use --from-code=windows-1252. OR, better yet, to save all source files as UTF-8, and then use --from-code=UTF-8.

--from-code=windows-1252 didn't appear to do any difference, but changing the encoding of the file to UTF-8 did. Thanks. — liftarn, Mar 22 '22 at 13:29

Gettext failed to extract non-ASCII characters

1 Answers1