I'm retrieving data from the internet and I want it to convert it to ASCII. But I can't get it fixed. (Python 2.7)
When I use decode('utf-8') on the strings I get for example Yalçınkaya. I want this however converted to Yalcinkaya. Raw data was Yalçınkaya.
Anyone who can help me?
Thanks.
Edit: I've tried the suggestion that was made by the user who marked this question as duplicate (What is the best way to remove accents in a Python unicode string?) but that did not solve my problem.
That post mainly talks about removing the special characters, and that did not solve my problem of replacing the turkish characters (Yalçınkaya) to their ascii characters (Yalcinkaya).
# Printing the raw string in Python results in "Yalçınkaya".
# When applying unicode to utf8 the string changes to 'Yalçınkaya'.
# HTMLParser is used to revert special characters such as commas
# FKD normalize is used, which converts the string to 'Yalçınkaya'.
# Applying ASCII encoding results in 'Yalcnkaya', missing the original turkish 'i' which is not what I wanted.
name = unicodedata.normalize('NFKD', unicode(name, 'utf8'))
name = HTMLParser.HTMLParser().unescape(name)
name = unicodedata.normalize('NFKD', u'%s' %name).encode('ascii', 'ignore')