0

It looks like that there are many of these UnicodeEncodeError errors, but none of them were useful for me.

I get this error:

Traceback (most recent call last):
  File "...", line 86, in <module>

  File "...", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 255: ordinal not in range(128)

What should I do?

py.codan
  • 89
  • 1
  • 11
  • 1
    What is your **full** traceback? What are you printing *to* (an IDE console? A terminal? Windows console? A pipe?) – Martijn Pieters Jan 14 '15 at 17:32
  • 2
    Notice that the error is a **encode** error, so it is not the *decode* that throws it, not directly. – Martijn Pieters Jan 14 '15 at 17:32
  • @MartijnPieters a `decode` will do an implicit `encode` first if the string isn't unicode; it would fit the symptoms. – Mark Ransom Jan 14 '15 at 17:34
  • @MarkRansom: yes, that's why I qualified my comment as *not directly*. – Martijn Pieters Jan 14 '15 at 17:35
  • what I'm printing to? What u mean with that? I've been stucked at this problem for a while (4-hours), and it still doesn't work. what should I do? – py.codan Jan 14 '15 at 17:36
  • Just to be clear: without showing us what is in `new_text` and the full traceback, your question is unanswerable. Either there is existing Unicode data in `new_text`, or you are using a console or terminal or pipe where the environment states it is using ASCII or there is no way to determine the environment codec. The full traceback starts with the text `Traceback (most recent call last):`. Showing the contents of `new_text` is best done with `print repr(new_text)`. – Martijn Pieters Jan 14 '15 at 17:37
  • @py.codan: right, you are trying to decode already-decoded Unicode values. I'd love to see what `new_text` actually contains, but I gave you a best-guess answer anyway. – Martijn Pieters Jan 14 '15 at 17:48

1 Answers1

0

You have data that is already decoded in new_text. You either have a mix of unicode and byte string data, or you have only unicode values.

What happens is that you ask Python to decode already decoded data, a unicode object. To make this work, Python will first encode to bytes, using the default ASCII encoding. That fails for those objects.

Either don't decode (if all your data is already decoded to unicode objects), or differentiate between objects that need to be decoded vs. those that are already unicode:

[x.decode('utf-8') if isinstance(x, str) else x for x in new_text]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • This doesn't give an Error, but it stille not gives me the new_text in the searched letters. This is the text I get: `ik \nan gkvfscdsennrr\n ip vainsateeyai u\xe5 l rmne vpb 0te\xe6totaktlaks s enns/ndn verseeaen\xe6o1mirglst .diofe Mnsoas ,esfs btelue vlrafsttsefne2kted ok e.n v.fnl,dkky k) entet Imdssk rsedfrg\ngt thgee\n- oKmraeo\n g l e tpsTrs a\nememv \ndtkg\xe5,nrn t pim\xf8mrgr pk\xe6 arenea r iryiksdrlh\xe6 m.` – py.codan Jan 14 '15 at 17:52
  • @py.codan: I'm not sure what you are asking here. Are you wondering why the data printed is using escape sequences? If so see [Python ascii utf unicode](http://stackoverflow.com/a/27256421) and print individual elements (`print new_text[0]`) or produce one unicode object (print `u' '.join(new_text)`). – Martijn Pieters Jan 14 '15 at 17:56
  • @py.codan: take into account that printing Unicode data to a console that doesn't support your codepoints can lead to more encoding errors. Windows consose especially is sensitive to this. – Martijn Pieters Jan 14 '15 at 17:56
  • I want a random text, with letters that a get from `input`. In this text I want 55 letters, and this letters should include æø. It should not give any sence (the new_text), but it should include these letters I showed. The text itself is't wrong, it have the "right" structure, but it should be unicoded (I think). – py.codan Jan 14 '15 at 18:03
  • print u' '.join(new_text) HELPED ME ! THANKS A LOT @Martijn Pieters! – py.codan Jan 14 '15 at 18:09