Python isalpha() and scandics

Question

Is there a way to have python isalpha method understand scandics? I have tried the following:

>>> import locale
>>> locale.getlocale()
(None, None)
>>> 'thisistext'.isalpha()
True
>>> 'äöå'.isalpha()
False
>>> locale.setlocale(locale.LC_ALL,"")
'Finnish_Finland.1252'
>>> locale.getlocale()
('Finnish_Finland', '1252')
>>> 'äöå'.isalpha()
False

Oleksandr Kravchuk · Answer 1 · 2010-11-27T00:42:16.687

10

Simplest way is to use unicode strings if it's okay in your case. Just put 'u' symbol before string:

>>> u'привіт'.isalpha()
True

Or this line as first at the file:

# -*- coding: utf-8 -*-

edited Nov 27 '10 at 00:42

answered Nov 26 '10 at 15:47

Oleksandr Kravchuk

5,963
1
20
31

1

For those who missed it, note the "u" which declares a unicode string. – moinudin Nov 26 '10 at 20:21

score 3 · Answer 2 · answered Nov 26 '10 at 20:01

It looks like what you have in your string constant is NOT a byte string encoded in cp1252, which is what is required to make str.isalpha work properly in your locale. You don't say in what environment you typed that. I can tell from the way that locale responds that you are on Windows; perhaps you are getting UTF-8 from some IDE or cp850 from a Command Prompt window.

What you see on your screen is often of very little help in debugging. What you see is NOT what you have got. The repr built-in function is (or wants to be) your friend. It will show unambiguously in ASCII what you actually have. [Python 3: repr is renamed ascii and there is a new repr which is not what you want]

Try typing s = "your string constant with 'accented' letters" then print repr(s) and edit your question to show the results (copy/paste, don't retype). Also say what Python version you are using.

Another would-be pal is `unicodedata.name' ... see below.

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'Finnish')
'Finnish_Finland.1252'
>>> s = '\xe4\xf6\xe5'
>>> import unicodedata
>>> for c in s:
...     u = c.decode('1252')
...     print repr(c), repr(u), unicodedata.name(u, '<no name>')
...
'\xe4' u'\xe4' LATIN SMALL LETTER A WITH DIAERESIS
'\xf6' u'\xf6' LATIN SMALL LETTER O WITH DIAERESIS
'\xe5' u'\xe5' LATIN SMALL LETTER A WITH RING ABOVE
>>> s.isalpha()
True

You can compare the above results with this chart.

score 1 · Answer 3 · answered Jan 28 '13 at 20:43

1

You could also try this:

>>> 'äöå'.decode('utf-8').isalpha()
True

answered Jan 28 '13 at 20:43

pupadupa

1,530
2
17
29

Python isalpha() and scandics

3 Answers3

Linked