0

In my source files I have string containing non-ASCII characters like

sCursorFormat = TRANSLATE("Frequency (Hz): %s\nDegree (°): %s");

But when I extract them they vanish like

msgid ""
"Frequency (Hz): %s\n"
"Degree (): %s"
msgstr ""

I have specified the encoding when extracting as

xgettext --from-code=UTF-8

I'm running under MS Windows and the source files are C++ (not that it should matter).

liftarn
  • 429
  • 3
  • 21
  • You are a member here for more than 9 years. What coding language is this? How can we reproduce the problem on our machines? This question is not well defined. – Dialecticus Mar 22 '22 at 10:42

1 Answers1

1

The encoding of your source file is probably not UTF-8, but ANSI, which stands for whatever the encoding for non-Unicode applications is (probably code page 1252). If you would open the file in some hex editor you would see byte 0x80 standing for degree symbol. This byte is not a valid UTF-8 character. In UTF-8 encoding degree symbol is represented with two bytes 0xC2 0xB0. This is why the byte vanishes when using --from-code=UTF-8.

The solution for your problem is to use --from-code=windows-1252. OR, better yet, to save all source files as UTF-8, and then use --from-code=UTF-8.

Dialecticus
  • 16,400
  • 7
  • 43
  • 103
  • --from-code=windows-1252 didn't appear to do any difference, but changing the encoding of the file to UTF-8 did. Thanks. – liftarn Mar 22 '22 at 13:29