I am using Readability Parser API to extract content from a web page. It is ok when the web page is in Latin character set, but when I extract article in Cyrillic, it ends up with the following:
<div>Ввоскресень</div>...etc
The interesting thing here is that the title of a web page is extracted correctly in Cyrillic, but not the content. My attempt was to do the following as it suggested in this SO answer:
content = unicodedata.normalize('NFKD', content).encode('ascii','ignore')
but it did not work. Could you tell me if there is a way to convert this string before saving to database?
Please let me know if the title of my question explains correctly what I need. Thank you.