How transliterate unicode text with PyICU to ASCII?

Question

There is the PyICU library, which I understand can be used to transliterate strings. However there are no docs. Anyone have a simple example which transliterates a unicode string to ASCII, with PyICU?

The C++ ICU documentation for transliteration is here, but I don't understand how to call it from Python.

@J.F.Sebastian, thanks, I actually found that and went for that. But I still thought that this question has some value, so I let it up. — Prof. Falken, Jan 22 '13 at 14:06

score 4 · Accepted Answer · answered Jul 15 '19 at 20:40

4

There is a nice cheat sheet for PyICU here: https://gist.github.com/dpk/8325992

Here's a slightly modified example:

>>> import icu
>>> tl = icu.Transliterator.createInstance('Any-Latin; Latin-ASCII')
>>> tl.transliterate('Ψάπφω')
'Psappho'

answered Jul 15 '19 at 20:40

Tavian Barnes

12,477
4
45
118

score 0 · Answer 2 · answered Jan 22 '13 at 13:53

From the first link that you gave, I am assuming 1) that you have already built PyICU 2) you have made sure that the library is accessible (see documentation on your linked page if you don't have the above)

I found this documentation from your link:

To convert a Python str encoded in a encoding other than utf-8 to an ICU UnicodeString use the UnicodeString(str, encodingName) constructor.

So you need to find the encodingName, I guess yours would be ASCII (you should check to make sure that it is correct, I haven't bothered)

Then I suppose you would do something like this:

>>> from icu import UnicodeString
 . 
 .
 . 
>>> string = UnicodeString(strToConvert, ASCII)

That is just a quick idea, ymmv. You might want to check the website as it gives more examples and how to do things the "Python way" or the "ICU way". CHEERS!

As I said, I just took a guess on that; just scan the docs for what symbol is supposed to be used for ASCII. You might try something like iso-646, or iso-8859, or perhaps even ascii. — happy coder, Jan 22 '13 at 15:39

How transliterate unicode text with PyICU to ASCII?

2 Answers2