Python Source Unicode to ASCII

Question

I have a problem converting HTML to normal text. I'm reading some pages and those include unicode signs like \u00f3 and \u00f1, etc. I want those converted to normal ASCII (not ó and ñ but o and n).

I've tried a lot in Python, but does anyone know an easy solution?

possible duplicate of [How to implement Unicode string matching by folding in python](http://stackoverflow.com/questions/1410308/how-to-implement-unicode-string-matching-by-folding-in-python) — Martijn Pieters, Jul 14 '13 at 16:33

score 0 · Accepted Answer · edited May 23 '17 at 12:29

0

Look at this S-O question : What is the best way to remove accents in a Python unicode string?

Two good library as a solution :

Unicode (may add characters) and unicodedata (troncate)

edited May 23 '17 at 12:29

Community

1
1

answered Jul 14 '13 at 17:05

jacquarg

176
1
7

Found the solution: (1) .decode("unicode-escape") (2) unicodedata.normalize('NFKD', webLine).encode('ascii','ignore') – Coryza Jul 14 '13 at 18:02

Python Source Unicode to ASCII

1 Answers1