5

This seems like a weird problem, and it's causing my some heartburn, because i'm using a library that stashes the current locale, and tries to set it back to what it stashed.

$ docker run --rm -it python:3.6 bash
root@bcee8785c2e1:/# locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
root@bcee8785c2e1:/# locale -a
C
C.UTF-8
POSIX
root@bcee8785c2e1:/# python
Python 3.6.9 (default, Jul 13 2019, 14:51:44) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> curr = locale.getlocale()
>>> curr
('en_US', 'UTF-8')
>>> locale.setlocale(locale.LC_ALL, curr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/locale.py", line 598, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting
>>>

I'm not sure why getlocale is returning en_US? It's not anywhere in my environment vars (and I'm not sure where else it could be in my shell?).

In any case, I can't setlocale with the value from getlocale, which seems weird to me.

Does anyone have any guidance here?

Much appreciated!

Hoopes
  • 3,943
  • 4
  • 44
  • 60
  • I don't think C.UTF-8 is a valid locale. C (synonym POSIX) is and is intended to be strictly a byte=char fall back [On phone so can't check] – Rusi Jan 07 '20 at 14:07
  • It's a bit more complicated Followup of above comment in answer below – Rusi Jan 10 '20 at 07:25

2 Answers2

1

For the first part: Does it matter? As far I know, I never see differences until you call setlocale(), so we are on the second part:

You should use:

import locale
curr = locale.getdefaultlocale()
locale.setlocale(locale.LC_ALL, curr)

so getdefaultlocale() and not just getlocale(). I also do not fully understand the reason to have both. Is it possible that it is a Python bug that fail to recognize C.xxx.

Giacomo Catenazzi
  • 8,519
  • 2
  • 24
  • 32
  • It's crazy, same result : ('en_US', 'UTF-8') ... and same error... – Hoopes Dec 12 '19 at 15:06
  • But it doesn't crash. Note: I'm not sure if there are testable differences (with outputs), of just `C.utf-8` includes `en-US.utf-8`. [Python 3.7.3 here, with ` LC_ALL='C.utf8' python3 /tmp/b.py`] – Giacomo Catenazzi Dec 12 '19 at 15:10
  • Well whether the bug is python, debian, docker or libc is arguable. C.UTF-8 sure is a bug-magnet — see my answer below, particularly the Haskell and Redhat bug-reports. – Rusi Jan 10 '20 at 16:27
  • @Rusi: but your answer is not an answer. `C.UTF-8` may be bad (but I looked the std and it should be correct, also because the two parts are different: one about how program should prepare things [user dependent], the second how to display [terminal dependent]. But the problem here is that the OP used the wrong function: he asked a string for locale which cannot be used to set locale. The string is an interpretation of locale string (useful for other purpose), but it is not a system locale, so it cannot be used safely. `C.UTF-8` is one case, but there are much more [locale semantic is complex] – Giacomo Catenazzi Jan 11 '20 at 08:16
1

C.UTF-8 — A recent non-portable debianism

The intention of C.UTF-8 is good but the implementation not quite yet. For now avoid till it stabilizes.

Some discussion of context

A redhat discussion around including it. Which means it's not quite there (at time of writing at least). Note particularly, Nick Coghlan, a core python-dev, suggests that python doesn't get locales right in some contexts like this one.

A haskell discussion showing that portable cross-platform stuff — in this case haskell-stack but by implication also docker — becomes harder and less reliable with C.UTF-8 usage.

The Intention

Debian (also) initiated C.UTF-8 and the intention is correct.

Today's Linux systems are intensively localized — a slew of locales, fine-grained choice of LC_* choices etc etc. But all this is not on by default: if the locale system is broken the system is broken. The reason a broken locale-system is not as drastic in effects as say a broken kernel or fstab or grub etc is...

The C locale

The C locale (synonym POSIX) is guaranteed to always be available as a fallback if other things break. So for example you won't see localized errors but English — not mojibake or empty rectangles or question-marks!

By and large you get these kind of warnings not errors and otherwise things keep working.

But C = POSIX implies the legacy ASCII not UTF-8 everywhere — an undesired side-effect of legacy.

Towards making that legacy less and less necessary even as a fallback, Debian introduced the always available C.UTF-8 locale.

The catch? It's always available...

Only in Debian

Which means recent Debian, derivatives like Ubuntu also recent. But not (yet) other systems.

In short C.UTF-8 is not universal, not portable, fragile and therefore avoidable... at least for now, at least on client-server, virtualized (containerized) etc systems like docker. The....

Practical Upshot

You need to explicitly install old-fashioned locales like en_US.UTF-8. (People wanting a reasonable international English locale and not wanting en_US may wish to check out en_DK.UTF-8).

Yeah that involves some amount of

Getting your hands dirty

Here is a collection of references on docker oriented locale setup

I don't approve of one anti-pattern that repeats in the above but It's going too far afield (from this question) to expand on this, so in v short:

Setting locale should usually only involve setting LANG. Setting LC_ALL , especially along with LANG is a no-no.

From Debian wiki

⚠️ WARNING

Using LC_ALL is strongly discouraged as it overrides everything. Please use it only when testing and never set it in a startup file.

Community
  • 1
  • 1
Rusi
  • 1,054
  • 10
  • 21
  • `en_DK` Is a crazy aberration which would perhaps be defensible if *it* was available everywhere by default; but, alas, it is not. Also IIRC it has less than ideal values for dates and currency. Perhaps see also https://unix.stackexchange.com/questions/62316/why-is-there-no-euro-english-locale – tripleee Jan 11 '20 at 10:14
  • 1
    @tripleee I've weakened the en_DK reference. Can remove it if you prefer (It's hardly relevant to the q or a!) – Rusi Jan 11 '20 at 17:25