How to get terminal's Character Encoding

Question

Now I change my gnome-terminal's character encoding to "GBK" (default it is UTF-8), but how can I get the value(character encoding) in my Linux?

Short writeup: [Unix Terminals: Surviving the Encoding Hell](http://benjamin-schweizer.de/unix-terminals-surviving-the-encoding-hell.html) — miku, Mar 15 '11 at 00:21

score 118 · Answer 1 · edited Oct 12 '17 at 15:54

118

The terminal uses environment variables to determine which character set to use, therefore you can determine it by looking at those variables:

echo $LC_CTYPE

or

echo $LANG

edited Oct 12 '17 at 15:54

Javier Arias

2,329
3
15
26

answered Mar 15 '11 at 00:40

Valdis

3,170
2
18
24

6

These environment variables are used by applications that are using the terminal for I/O. The terminal emulator itself has no knowledge of them whatsoever, and its currently effective character encoding is a setting somewhere within the emulator program (a data member inside a libvte class in the case of GNOME Terminal). – JdeBP Oct 31 '17 at 13:44
1

the ordering of variables suggested here is not good. a more complete solution would be something like: echo ${LC_ALL:-${LC_CTYPE:-${LANG}}}. then again, the variable being set isn't a guarantee that they're valid, so you should stick to the `locale` program (as seen in other answers here). – Mike Frysinger Jan 19 '18 at 04:09
As @JdeBP said, the terminal does *not* use the `locale` environment variables to determine its encoding. The terminal can however let applications that interact it know its encoding by setting the `locale` environment variables. For instance, on macOS you can choose the terminal encoding and optionally set the `locale` environment variables at terminal startup in `Terminal` > `Preferences` > `Profiles` > `Advanced`. – Géry Ogam Feb 21 '18 at 21:19

score 110 · Answer 2 · answered Jul 25 '13 at 04:58

110

locale command with no arguments will print the values of all of the relevant environment variables except for LANGUAGE.

For current encoding:

locale charmap

For available locales:

locale -a

For available encodings:

locale -m

answered Jul 25 '13 at 04:58

nyzm

2,787
3
24
30

2

This is what worked for me on a CentOS system. It showed the system encoding based upon current language settings. The terminal settings used to get to that machine are a different story and a function of the client being used. – Phil DD Apr 06 '18 at 17:15

score 48 · Answer 3 · answered Aug 15 '11 at 19:40

48

Check encoding and language:

$ echo $LC_CTYPE
ISO-8859-1
$ echo $LANG
pt_BR

Get all languages:

$ locale -a

Change to pt_PT.utf8:

$ export LC_ALL=pt_PT.utf8 
$ export LANG="$LC_ALL"

answered Aug 15 '11 at 19:40

Moreno

811
7
9

score 20 · Answer 4 · answered Aug 05 '17 at 14:40

20

If you have Python:

python -c "import sys; print(sys.stdout.encoding)"

answered Aug 05 '17 at 14:40

Martin Thoma

124,992
159
614
958

score 6 · Answer 5 · answered Apr 14 '16 at 07:02

To my knowledge, no.

Circumstantial indications from $LC_CTYPE, locale and such might seem alluring, but these are completely separated from the encoding the terminal application (actually an emulator) happens to be using when displaying characters on the screen.

They only way to detect encoding for sure is to output something only present in the encoding, e.g. ä, take a screenshot, analyze that image and check if the output character is correct.

So no, it's not possible, sadly.

score 5 · Answer 6 · answered Jul 31 '20 at 12:00

To see the current locale information use locale command. Below is an example on RHEL 7.8

[usr@host ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

user1254127 · Answer 7 · 2021-09-04T09:03:23.857

Examination of https://invisible-island.net/xterm/ctlseqs/ctlseqs.html, the xterm control character documentation, shows that it follows the ISO 2022 standard for character set switching. In particular ESC % G selects UTF-8. So to force the terminal to use UTF-8, this command would need to be sent. I find no way of querying which character set is currently in use, but there are ways of discovering if the terminal supports national replacement character sets.

However, from charsets(7), it doesn't look like GBK (or GB2312) is an encoding supported by ISO 2022 and xterm doesn't support it natively. So your best bet might be to use iconv to convert to UTF-8.

Further reading shows that a (significant) subset of GBK is EUC, which is a ISO2022 code, so ISO2022 capable terminals may be able to display GBK natively after all, but I can't find any mention of activating this programmatically, so the terminal's user interface would be the only recourse.

How to get terminal's Character Encoding

7 Answers7

Linked

Related