Setting locale and string module

Question

This simple scrit:

from locale import LC_ALL, setlocale
print setlocale(LC_ALL,"")
from string import letters
print letters

gives me this output:

tr_TR.utf8
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

And in here: string.letters it says that it's value is locale dependent, and updated when setlocale is called. However, I am not seeing any letter from my locale. Is there any way that I could get list of letter for current locale?

martineau · Answer 1 · 2014-07-18T15:50:32.017

2

I had to explicitly set the locale to Turkish since that isn't the default on my computer, but it seems to more-or-less work:

> python
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

>>> from locale import LC_ALL, setlocale
>>> print setlocale(LC_ALL,"Turkish")
Turkish_Turkey.1254
>>> from string import letters
>>> print letters
abcdefghijklmnopqrstuvwxyzƒsoªµºßàáâaäåæçèéêëìíîïgñòóôoöoùúûüisÿ...
  ABCDEFGHIJKLMNOPQRSTUVWXYZSOYAAAAÄÅÆÇEÉEEIIIIGÑOOOOÖOUUUÜIS
>>>

The output basically looks correct (AFAIK) except for the inclusion of Q, W, and X, which from what's in this Wikipedia article aren't part of the Turkish alphabet.

Update:

To better replicate your environment, I first used the "Regional and Language Options" control panel and changed my region to "Turkish", which should make it the default for setlocale. Indeed it did however the list of letters still looks OK — so I can't reproduce your problem.

One difference this time is that before running python I first changed to console's code page to Windows ANSI Turkish 1254 to enable the correct display of character from the alphabet. This made the last two letters of the output display correctly, however it also still includes the Q, W, and X letters which aren't part of the alphabet (and wrong to be there, in my option).

C:\>chcp 1254
Active code page: 1254

C:\>python
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from locale import LC_ALL, setlocale
>>> print setlocale(LC_ALL,"")
Turkish_Turkey.1254
>>> from string import letters
>>> print letters
abcdefghijklmnopqrstuvwxyzƒšœªµºßàáâãäåæçèéêëìíîïğñòóôõöøùúûüışÿ...
  ABCDEFGHIJKLMNOPQRSTUVWXYZŠŒŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖØÙÚÛÜİŞ
>>>

edited Jul 18 '14 at 15:50

answered Mar 28 '12 at 19:57

martineau

119,623
25
170
301

+1 When I set the cmd.exe code page to 857 (IBM PC Turkish code page) or 1254 (Windows ANSI Turkish code page), I get the same output as yours, except the Turkish letters are shown correctly. For example, the last two letters in your output above ("...IS") are output properly as "...İŞ" for both 857 and 1254. If I set the code page to 437 (US English default), then I get exactly the same output as yours, with Turkish letters converted to nearest ASCII equivalents (without the accents/diacritics). For code page 65001 (Windows UTF-8), Python generates LookupError: unknown encoding: cp65001. – Sabuncu Nov 05 '13 at 22:06
PS: I am using Python 2.7.2. – Sabuncu Nov 05 '13 at 22:13
@Sabuncu: If I first set the cmd's code page to Windows ANSI Turkish with a `chcp 1254`, and then rerun the statements shown in my answer, the last two characters of the output are now "İŞ" (running Python 2.7.5). The output still includes the Q, W, and X though. Do you know why that is? – martineau Nov 06 '13 at 00:22
1

@Sabuncu: Note, you can use code page 65001, which is utf-8 encoding, in Python 3.3. – martineau Nov 06 '13 at 00:26
The problem is not just the inclusion of q, w and x. The set includes all sorts of other characters which are not in the Turkish alphabet: `ƒšœªµºßàáâãäåæèéêëìíîïñòóôõøùúûÿ...` – Sabuncu Nov 06 '13 at 13:59
Python documentation excels at ambiguity. Here's what it says for `string.uppercase`: On most systems this is the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'! – Sabuncu Nov 06 '13 at 14:03
@Sabuncu: They seem to have noticed at least some of the ambiguities because in Python 3.3 `string.letters` and `string.uppercase` are now gone. Leaving non-locale-dependent things like `string.ascii_letters` and `string.ascii_uppercase`. I think one reason there's, for example, no `string.letters` any more is because, it could potentially be a _huge_ locale-specific Unicode string. In fact all the constants remaining in `string` are ASCII-based (whether that fact is reflected in their name or not). – martineau Nov 06 '13 at 18:17
I see, thanks. Not sure why I am still using 2.7. Maybe I need to forgo and make the switch. – Sabuncu Nov 06 '13 at 19:50
@Sabuncu: Probably not necessary. Check out the question and accepted answer [_An equivalent to string.ascii_letters for unicode strings in python 2.x?_](http://stackoverflow.com/questions/2126551/an-equivalent-to-string-ascii-letters-for-unicode-strings-in-python-2-x). However, as I cautioned, the resulting locale-specific Unicode string can be quite long. – martineau Nov 06 '13 at 20:10

Setting locale and string module

1 Answers1

Linked