0

I'm currently on GCC Cygwin trying to output my French translated content. The input source string, encoded in a UTF-8 file using Eclipse, and verified in Notepad as UTF-8 encoded is:

wchar_t **strings;
...
strings[NL_ARGUMENT_DESCRIPTION_ADD_METADATA] =
        L"Ajouter du contenu de métadonnées.";

compiles just fine under GCC 11.2.0 with the -finput-charset=UTF-8. The resulting output ends up messed up as:

Ajouter du contenu de m▒tadonn▒es.

Unless I pipe this to iconv --from-code=ISO-8859-1 --to-code=UTF-8 where the output ends up looking correct as:

Ajouter du contenu de métadonnées.

Is there any way to avoid having to pipe this to iconv as I'm dealing with a GCC program with a CLI and do not want users to have to worry about piping to read their online help? I am using wprintf(L"%S")-style format strings when generating output.

AmigoJack
  • 5,234
  • 1
  • 15
  • 31
  • Haven't you ever wondered what the `L` prefix of your literal means? And if it shouldn't be `u8` instead? See [difference between L"" and u8""](https://stackoverflow.com/q/18325501/4299358). – AmigoJack Feb 21 '22 at 05:54
  • I did not specifically add that the code has to be portable and work in more than gcc. c99 does not support u8"" literals, so L"" is all I can use. – Randall Becker Feb 21 '22 at 14:11
  • Also to point out, I am using Cygwin's cygterm - it is up-to-date as of yesterday. – Randall Becker Feb 21 '22 at 14:47
  • please provide a `Simple Test Case` that we can build and run and the exact command you are using for compilation – matzeri Feb 22 '22 at 12:39
  • Simple test case. `#include ` `#include ` `int main(int argc, char **argv) {` `wprintf(L"Ajouter du contenu de métadonnées.\n");` `return 0;` `}` This works correctly under c99 on a few platforms (NonStop x86/ia64) using c99 test.c -o test. It does not work on Cygwin with GCC 11.2.0 on Windows unless I pass this to iconv. – Randall Becker Feb 23 '22 at 15:08
  • If you are on Cygwin, you probably need to call `setlocale(LC_ALL, "fr_FR.utf8")` or similar (with a Unix-style locale name). – n. m. could be an AI Feb 28 '22 at 19:18
  • P(setLocale) < .5... probably sadly is not more than 50%. Passing through iconv still works, but setting LC_ALL or LANG appropriately does not. – Randall Becker Mar 02 '22 at 04:09
  • Does your `setlocale` succeed? – n. m. could be an AI Mar 04 '22 at 16:10
  • Sadly no. `setlocale` does not make a difference in behaviour. – Randall Becker Mar 05 '22 at 18:42

1 Answers1

0

Following the guidance of the answer in difference between L"" and u8""

The following code produces the expected output

$ cat ./prova.c
#include <stdio.h>
#include <wchar.h>
int main(int argc, char **argv)
{ printf(u8"Ajouter du contenu de métadonnées.\n");
return 0; }

$ gcc -o prova prova.c

$ ./prova
Ajouter du contenu de métadonnées.

On cygwin it also work the following

$ cat prova.c
#include <stdio.h>
int main(int argc, char **argv)
{ printf("Ajouter du contenu de métadonnées.\n");
return 0; }

$ gcc -std=c99 -Wall -o prova prova.c

$ ./prova.exe
Ajouter du contenu de métadonnées.
matzeri
  • 8,062
  • 2
  • 15
  • 16
  • I do fully understand that `u8""` works in gcc. It requires `--std=c11` or above, which is not portable where I have to deploy the solution (requires c99 compatibility, which I should have added to the description). I will accept the answer but cannot use it until 2027. Thanks though. – Randall Becker Feb 27 '22 at 14:39
  • @RandallBecker You do not need `u8`. Use plain jane `char`. – n. m. could be an AI Feb 28 '22 at 19:20
  • Not a portable solution. char outside gcc is almost never UTF-8. – Randall Becker Mar 02 '22 at 04:07
  • @RandallBecker If you need a portable solution, you are out of luck. There are no portable solutions. Since you are in cygwin, all compilers available in cygwin work with UTF-8. – n. m. could be an AI Mar 04 '22 at 16:10
  • Thanks. I'm at that conclusion as well. Where I am going is considering reprocessing the source to use `u8""` where I have `gcc` and `c11` but staying with `L""` where I am limited to `c99` - the primary platform I need this on is limited to `c99` but that changes in 2027 where I can move off `c99` forever (like that will be true). The NLS modules are fairly contained and in UTF-8 source format, so changing the prefix before the compiler sees it might be an option. – Randall Becker Mar 05 '22 at 18:47