0

I am writing a console program in C.

I expect the Terminal that my program is running in to have its character encoding set to UTF-8. This means that I am sending UTF-8 encoded strings to the Terminal, and expecting to receive UTF-8 encoded strings from the Terminal.

But if the Terminal was set to another character encoding (other than UTF-8) while my program is running, then my program will stop working as expected.

So is there a way to know what character encoding the Terminal is set to from within my program (so that I can change my program behavior accordingly)? And even if there is such a way, should I even bother making my program work with multiple character encodings, or is it enough to only make it work with UTF-8?

joseph_m
  • 121
  • 2
  • Yes, you can, but other encodings are limited on number of characters. So if you want to be encoding independent, you must use minimal common set, which it is ASCII. [Really the C trigraphs show that minimal encoding is also smaller then ASCII, but since 1990 I think ASCII is safe. – Giacomo Catenazzi Jun 06 '18 at 06:18
  • If you use POSIX.1 [wide character functions](http://man7.org/linux/man-pages/man0/wchar.h.0p.html), the functions handle the conversion to the character set used by the locale. – Nominal Animal Jun 06 '18 at 14:27
  • @Giacomo Catenazzi *"Yes, you can"* after reading the duplicate question, it turned out that no you can't know the character encoding of the Terminal from within your program. – joseph_m Jun 07 '18 at 00:18
  • @Nominal Animal Using the wide character functions only handle the conversion to the character set used by the locale of my program, and not to the character set used by the Terminal. – joseph_m Jun 07 '18 at 00:25
  • @joseph_m: No, your terminal should use the character set defined by your locale -- or more properly, your terminal should be set to define the correct locale, including the character set (which is usually a suffix to the locale, e.g. `C.utf8` or `C.iso885915`). If not, your terminal is simply misconfigured. – Nominal Animal Jun 07 '18 at 03:23
  • @Nominal Animal *"your terminal should use the character set defined by your locale"* You mean the locale of my program/process? – joseph_m Jun 07 '18 at 03:44
  • @joseph_m: The locale (`LANG`, `LC_` environment variables) are set whenever you login; either a default one (see `/etc/default/locale` in Debian/Ubuntu/Mint), or one you set yourself in your profile or shell startup files. If you use different terminals, you can examine the `TERM` environment variable (and others) to set the proper locale in your shell startup files; for Bash, this would be `.bashrc` in your home directory. The character set your program/process uses should always be dictated by `LANG`/`LC_`* environment variables. – Nominal Animal Jun 07 '18 at 04:23
  • @Nominal Animal I am using Konsole as the Terminal. I changed the `LANG` variable in the `/etc/default/locale` file, and restarted the computer. Now when I open Konsole and type the `locale` command, I see that the `LANG` variable for `bash` has changed to the value that I set in the `/etc/default/locale` file, but Konsole still has the same character encoding that I set in its settings (so it was not affected by the `/etc/default/locale` file) . – joseph_m Jun 07 '18 at 06:33
  • @joseph_m: So? The idea is that you can use whatever you want, but you must set them both (both `LANG`/`LC_` and the character set in your terminal). The terminal does not tell anyone what character set it is using; only the user who set or changed it knows. No, you cannot query it programmatically. – Nominal Animal Jun 07 '18 at 06:45
  • @Nominal Animal I think I know where the misunderstanding comes from, when you said *"your terminal should use the character set defined by your locale"*, you meant that I should set the character set of the Terminal manually to the locale of my program, and not that the Terminal will automatically change its own character set, right? – joseph_m Jun 07 '18 at 07:02
  • @joseph_m: Right. It is up to the user to ensure their terminal settings agree with the locale in use, at any point in time. There is no way a program running in a terminal can check the character set used by the terminal; a program can check the size (number of rows and columns) of the window size, and can get notified (by SIGWINCH signal) if the window size changes, but that's about it. All programs are expected to follow the locale settings, and those are defined in the `LANG`, `LC_` environment variables. – Nominal Animal Jun 07 '18 at 13:57

0 Answers0