I parse html with python and there is date string: [ 24-Янв-17 07:24 ]
. "Янв" is "Jan". I want to convert it into datetime object.
# Some beautifulsoup parsing
timeData = data.find('div', {'id' : 'time'}).text
import locale
locale.setlocale(locale.LC_TIME, 'ru_RU.UTF-8')
result = datetime.datetime.strptime(timeData, u'[ %d-%b-%y %H:%M ]')
The error is:
ValueError: time data '[ 24-\xd0\xaf\xd0\xbd\xd0\xb2-17 07:24 ]' does not match format '[ %d-%b-%y %H:%M ]'
type(timeData)
returns unicode. Encoding timeData
from utf-8
returns UnicodeEncodeError
. What's wrong?
chardet returns {'confidence': 0.87625, 'encoding': 'utf-8'}
and when I write: datetime.datetime.strptime(timeData.encode('utf-8'), ...)
it returns error as above.
Original page has window-1251
encoding.
print type(timeData)
print timeData
timeData = timeData.encode('cp1251')
print type(timeData)
print timeData
returns
<type 'unicode'>
[ 24-Янв-17 07:24 ]
<type 'str'>
[ 24-???-17 07:24 ]