0

y0 I have this problem that characters that include ñ or ŕ í á ú etc are discarded when I apply

text = text.encode('ascii', 'ignore')

to a function that needs the input to be ascii.

is there a way to force it to ascii without losing those characters or should I change the function to accept unicode characters?

http://dpaste.com/601417/

madprops
  • 3,909
  • 5
  • 34
  • 42
  • What function? Why does it "need the input to be ascii"? – Karl Knechtel Aug 23 '11 at 22:24
  • 3
    Use normalization, then throw away the diacritics: http://stackoverflow.com/questions/175240/how-do-i-convert-a-files-format-from-unicode-to-ascii-using-python/175270#175270 – Wooble Aug 23 '11 at 23:18

2 Answers2

5

The 'ascii' encoding can't represent the characters you refer to. You have to choose a different encoding — perhaps 'cp850' or 'latin_1' — but then you have to be sure that your output terminal interprets 8-bit codes using the relevant code page.

On balance, life is easier if you just go Unicode all the way.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
0

Yes, you should go for another encoding, if you need those characters (for example Unicode). See ascii table for all chars that are included in ascii.

sunadorer
  • 3,855
  • 34
  • 42