4

Is there a way to have python isalpha method understand scandics? I have tried the following:

>>> import locale
>>> locale.getlocale()
(None, None)
>>> 'thisistext'.isalpha()
True
>>> 'äöå'.isalpha()
False
>>> locale.setlocale(locale.LC_ALL,"")
'Finnish_Finland.1252'
>>> locale.getlocale()
('Finnish_Finland', '1252')
>>> 'äöå'.isalpha()
False
user250765
  • 91
  • 1
  • 1
  • 4

3 Answers3

10

Simplest way is to use unicode strings if it's okay in your case. Just put 'u' symbol before string:

>>> u'привіт'.isalpha()
True

Or this line as first at the file:

# -*- coding: utf-8 -*-
Oleksandr Kravchuk
  • 5,963
  • 1
  • 20
  • 31
3

It looks like what you have in your string constant is NOT a byte string encoded in cp1252, which is what is required to make str.isalpha work properly in your locale. You don't say in what environment you typed that. I can tell from the way that locale responds that you are on Windows; perhaps you are getting UTF-8 from some IDE or cp850 from a Command Prompt window.

What you see on your screen is often of very little help in debugging. What you see is NOT what you have got. The repr built-in function is (or wants to be) your friend. It will show unambiguously in ASCII what you actually have. [Python 3: repr is renamed ascii and there is a new repr which is not what you want]

Try typing s = "your string constant with 'accented' letters" then print repr(s) and edit your question to show the results (copy/paste, don't retype). Also say what Python version you are using.

Another would-be pal is `unicodedata.name' ... see below.

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'Finnish')
'Finnish_Finland.1252'
>>> s = '\xe4\xf6\xe5'
>>> import unicodedata
>>> for c in s:
...     u = c.decode('1252')
...     print repr(c), repr(u), unicodedata.name(u, '<no name>')
...
'\xe4' u'\xe4' LATIN SMALL LETTER A WITH DIAERESIS
'\xf6' u'\xf6' LATIN SMALL LETTER O WITH DIAERESIS
'\xe5' u'\xe5' LATIN SMALL LETTER A WITH RING ABOVE
>>> s.isalpha()
True

You can compare the above results with this chart.

John Machin
  • 81,303
  • 11
  • 141
  • 189
1

You could also try this:

>>> 'äöå'.decode('utf-8').isalpha()
True
pupadupa
  • 1,530
  • 2
  • 17
  • 29