Some of our clients submit timestamps like ٢٠١٥-١٠-٠٣ ١٩:٠١:٤٣ which Google translates to "03/10/2015 19:01:43". Link here.
How can I achieve the same in Python?
Some of our clients submit timestamps like ٢٠١٥-١٠-٠٣ ١٩:٠١:٤٣ which Google translates to "03/10/2015 19:01:43". Link here.
How can I achieve the same in Python?
There is also the unidecode
library from https://pypi.python.org/pypi/Unidecode.
In Python 2:
>>> from unidecode import unidecode
>>> unidecode(u"۰۱۲۳۴۵۶۷۸۹")
'0123456789'
In Python 3:
>>> from unidecode import unidecode
>>> unidecode("۰۱۲۳۴۵۶۷۸۹")
'0123456789'
My solution fails for a different timestamp: u'۲۰۱۵-۱۰-۱۸ ۰۸:۲۲:۱۱'. Go for J.F. Sebastian's or jimhark's solution.
Using ord
get the the unicode code point. The numbers start from 1632 (0).
d = u'٢٠١٥-١٠-٠٣ ١٩:٠١:٤٣'
s = []
for c in d:
o = ord(c)
print '%s -> %s, %s - 1632 = %s' %(c, o, o, o - 1632)
if 1631 < o < 1642:
s.append(str(o - 1632))
continue
s.append(c)
print ''.join(s)
#or as a one liner:
print ''.join([str(ord(c)-1632) if 1631 < ord(c) < 1642 else c for c in d])
Here is the output of the for loop:
٢ -> 1634, 1634 - 1632 = 2
٠ -> 1632, 1632 - 1632 = 0
١ -> 1633, 1633 - 1632 = 1
٥ -> 1637, 1637 - 1632 = 5
- -> 45, 45 - 1632 = -1587
١ -> 1633, 1633 - 1632 = 1
٠ -> 1632, 1632 - 1632 = 0
- -> 45, 45 - 1632 = -1587
٠ -> 1632, 1632 - 1632 = 0
٣ -> 1635, 1635 - 1632 = 3
-> 32, 32 - 1632 = -1600
١ -> 1633, 1633 - 1632 = 1
٩ -> 1641, 1641 - 1632 = 9
: -> 58, 58 - 1632 = -1574
٠ -> 1632, 1632 - 1632 = 0
١ -> 1633, 1633 - 1632 = 1
: -> 58, 58 - 1632 = -1574
٤ -> 1636, 1636 - 1632 = 4
٣ -> 1635, 1635 - 1632 = 3
2015-10-03 19:01:43
To convert the time string to a datetime object (Python 3):
>>> import re
>>> from datetime import datetime
>>> datetime(*map(int, re.findall(r'\d+', ' ٢٠١٥-١٠-٠٣ ١٩:٠١:٤٣')))
datetime.datetime(2015, 10, 3, 19, 1, 43)
>>> str(_)
'2015-10-03 19:01:43'
If you need only numbers:
>>> list(map(int, re.findall(r'\d+', ' ٢٠١٥-١٠-٠٣ ١٩:٠١:٤٣')))
[2015, 10, 3, 19, 1, 43]
While inspired by some of the other answers (thanks @kev), I took a different approach.
(Doh! I just noticed @kev also asked this question.)
You asked specifically about Arabic characters, but it simplifies things to handle all Unicode digits.
Note: I process the same date string, but specify the Unicode characters using Unicode escape sequences because that was easier on my system.
import unicodedata
unicodeDate = u'\u0662\u0660\u0661\u0665-\u0661\u0660-\u0660\u0663 \u0661\u0669:\u0660\u0661:\u0664\u0663'
converted = u''.join([unicode(unicodedata.decimal(c, c)) for c in unicodeDate])
print converted
The second argument to unicodedata.decimal is the default value to return if the first argument doesn't map to a Unicode decimal. The effect of passing in the same character for both arguments is any Unicode decimal is converted to the equivalent ASCII decimal, and all other characters pass through unchanged.
converted = ''.join([str(unicodedata.digit(c, c)) for c in unicodeDate])
@J.F. Sebastian, provided a helpful comment that pointed out the code above doesn't properly handle super scripts, for example u'\u00b2'. Also in the same group are superscripts: '\u00b3', u'\u00b9'. I found this also effects some code points from:
Apparently unicodedata.digit()
tries to pull a digit out of a decorated number, which probably isn't desirable here. But unicodedata.decimal
seems like it does exactly what's needed (assuming you don't want to convert decorated digits).