2

There is the PyICU library, which I understand can be used to transliterate strings. However there are no docs. Anyone have a simple example which transliterates a unicode string to ASCII, with PyICU?

The C++ ICU documentation for transliteration is here, but I don't understand how to call it from Python.

Prof. Falken
  • 24,226
  • 19
  • 100
  • 173
  • 1
    related: [Unidecode](http://pypi.python.org/pypi/Unidecode) – jfs Jan 22 '13 at 13:59
  • @J.F.Sebastian, thanks, I actually found that and went for that. But I still thought that this question has some value, so I let it up. – Prof. Falken Jan 22 '13 at 14:06

2 Answers2

4

There is a nice cheat sheet for PyICU here: https://gist.github.com/dpk/8325992

Here's a slightly modified example:

>>> import icu
>>> tl = icu.Transliterator.createInstance('Any-Latin; Latin-ASCII')
>>> tl.transliterate('Ψάπφω')
'Psappho'
Tavian Barnes
  • 12,477
  • 4
  • 45
  • 118
0

From the first link that you gave, I am assuming 1) that you have already built PyICU 2) you have made sure that the library is accessible (see documentation on your linked page if you don't have the above)

I found this documentation from your link:

To convert a Python str encoded in a encoding other than utf-8 to an ICU UnicodeString use the UnicodeString(str, encodingName) constructor.

So you need to find the encodingName, I guess yours would be ASCII (you should check to make sure that it is correct, I haven't bothered)

Then I suppose you would do something like this:

>>> from icu import UnicodeString
 . 
 .
 . 
>>> string = UnicodeString(strToConvert, ASCII)

That is just a quick idea, ymmv. You might want to check the website as it gives more examples and how to do things the "Python way" or the "ICU way". CHEERS!

happy coder
  • 1,517
  • 1
  • 14
  • 29
  • As I said, I just took a guess on that; just scan the docs for what symbol is supposed to be used for ASCII. You might try something like iso-646, or iso-8859, or perhaps even ascii. – happy coder Jan 22 '13 at 15:39