Python: Certain unicode characters do not display correctly

Question

I am trying to set the label of a GUI-element to display a greek letter with Python.

str(u'\u0054'.encode('utf8')) will correctly produce the unicode character 'T', as its unicode number is 0054.

Writing str(u'\u03B6'.encode('utf8')) will not display the Greek letter small zeta but this thing instead.

I tried writing str(u'\uceb6'.encode('utf8')) as well (ceb6 is the utf-8 encoding of the character I want), but got a similar, strange looking character that certainly wasn't the Greek letter zeta.

According to this site the character is available in common fonts.

Might it be that the GUI-toolkit uses a strange font? I am using the FOX toolkit.

Any help is appreciated.

EDIT: I am specifically trying to create a text label FXLabel(parent, string) where i supply the string str(u'\u03B6'.encode('utf8')). And as mentioned, supplying the string with the unicode number of capital t will produce the expected label.

The character `T`, encoded as a UTF8 string, looks like this: `T`. However, the character `ζ` as a UTF8 string is this: `Î¶`, that is, the characters `0xCE` and `0xB6` in an arbitrary code page (this one is Latin-1). Which is what you got, so the problem lies not in encoding but in what your library expects. — Jongware, Jul 12 '16 at 14:12
.. By the way, your quote "available in most fonts" is far from what is actually said: "Supported in all common fonts". The 'common fonts' under that remark are the so-called 'web safe fonts', "likely to be present on a wide range of computer systems" (https://en.wikipedia.org/wiki/Web_typography#Web-safe_fonts). Unless you don't have a lot of fonts, this is a **very** small subset of 'most fonts'. — Jongware, Jul 12 '16 at 14:14
@RadLexus Thanks, so I should find which numbers correspond to the character I want in Latin-1? I just find it strange that it interprets the encoding of T as one single character while in the second case it breaks it up into two parts. Oh and I edited my question :) — DjungelJarl, Jul 12 '16 at 14:25
@RadLexus I found [this](https://msdn.microsoft.com/en-us/library/cc195054.aspx) link showing the codes for the latin-1 localization. There isn't much. Can I somehow specify to look for Unicode numbers and not numbers in an arbitrary code page? — DjungelJarl, Jul 12 '16 at 14:30
You may need to brush up on what "UTF8" actually *means*. In Python (and in other programming languages as well), a `character` cannot hold any arbitrary value but is typically restricted to a range of `0` to `255`. "UTF8" is a way circumvent this and to store many more code points; it does necessarily do so by using more than a single character. — Jongware, Jul 12 '16 at 15:08

tripleee · Answer 1 · 2016-07-12T14:08:28.963

0

Your output encoding is wrong. Make sure your terminal is correctly configured for UTF-8 output.

If I interpret your (rather muddy) image correctly, CE B6 is being displayed as Î‎¶‎ which is consistent with any one of a number of common Western 8-bit encodings.

edited Jul 12 '16 at 14:08

answered Jul 12 '16 at 14:01

tripleee

175,061
34
275
318

Yes, that is the character(s) that are outputted. What do you mean by configuring my terminal for UTF-8 output? As to the link you supplied, I am not exactly sure of what I am looking at. How do I know which localization to use, and why does Python interpret my unicode number as two characters with two hexadecimal numbers each instead of one character with four hexadecimal numbers? – DjungelJarl Jul 12 '16 at 14:21
1

UTF-8 encodes this character as two bytes. You need to configure your environment (basically, the program from which you run Python) to interpret this output correctly. This is a common FAQ; you should easily find instructions for your environment, )ut your question lacks the details to tell you anything specific. – tripleee Jul 12 '16 at 15:14

Python: Certain unicode characters do not display correctly

1 Answers1

Linked