-2

I have a problem converting HTML to normal text. I'm reading some pages and those include unicode signs like \u00f3 and \u00f1, etc. I want those converted to normal ASCII (not ó and ñ but o and n).

I've tried a lot in Python, but does anyone know an easy solution?

dda
  • 6,030
  • 2
  • 25
  • 34
Coryza
  • 231
  • 1
  • 3
  • 12
  • possible duplicate of [How to implement Unicode string matching by folding in python](http://stackoverflow.com/questions/1410308/how-to-implement-unicode-string-matching-by-folding-in-python) – Martijn Pieters Jul 14 '13 at 16:33

1 Answers1

0

Look at this S-O question : What is the best way to remove accents in a Python unicode string?

Two good library as a solution :

Unicode (may add characters) and unicodedata (troncate)

Community
  • 1
  • 1
jacquarg
  • 176
  • 1
  • 7
  • Found the solution: (1) .decode("unicode-escape") (2) unicodedata.normalize('NFKD', webLine).encode('ascii','ignore') – Coryza Jul 14 '13 at 18:02