I've been looking for a simple way to convert a number from a unicode string to an ascii string in python. For example, the input:
input = u'\u0663\u0669\u0668\u066b\u0664\u0667'
Should yield '398.47'
.
I started with:
NUMERALS_TRANSLATION_TABLE = {0x660:ord("0"), 0x661:ord("1"), 0x662:ord("2"), 0x663:ord("3"), 0x664:ord("4"), 0x665:ord("5"), 0x666:ord("6"), 0x667:ord("7"), 0x668:ord("8"), 0x669:ord("9"), 0x66b:ord(".")}
input.translate(NUMERALS_TRANSLATION_TABLE)
This solution worked, but I want to be able to support all numbers-related characters in unicode, and not just Arabic. I can translate the digits by going over the unicode string and running unicodedata.digit(input[i])
on each character. I don't like this solution, because it doesn't solve '\u066b'
or '\u2013'
. I could solve these by using translate
as a fallback, but I'm not sure whether there are other such characters that I'm not currently aware of, and so I'm trying to look for a better, more elegant solution.
Any suggestions would be greatly appreciated.