0

I believe in the unicode sandwich. I use the unicode sandwich. So why is it that when I run the following on a byte string (py 2.7)...

label = label.decode("utf-8")

I still get an error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/celery/app/trace.py", line 648, in __protected_call__
    return self.run(*args, **kwargs)
  File "/opt/celery/cl/scrapers/tasks.py", line 638, in update_docket_info_iquery
    d = update_docket_metadata(d, report.metadata)
  File "/usr/local/lib/python2.7/site-packages/juriscraper/pacer/case_query.py", line 166, in metadata
    self._get_label_value_pair(bold, True, field_names)
  File "/usr/local/lib/python2.7/site-packages/juriscraper/pacer/docket_report.py", line 233, in _get_label_value_pair

    label = label.decode("utf-8") <---- Shouldn't this work?

  File "/usr/local/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 6: ordinal not in range(128)

And, why is this throwing a UnicodeEncodeError when I'm trying to do a decode on the line that crashes?

I'm confused. Again.

mlissner
  • 17,359
  • 18
  • 106
  • 169
  • 1
    What is 'label'? Could you please add it to the question? – Roy2012 May 27 '20 at 06:06
  • 1
    I would assume its because the bytestring is not valid utf8 – Joran Beasley May 27 '20 at 06:08
  • How was `label` created? – John Gordon May 27 '20 at 06:16
  • 1
    You got a `UnicodeEncodeError` indicating `label` was already a Unicode string. Python 2.7 implicitly encodes it back to a byte string using the default `ascii` codec before trying to decode it to UTF-8, and that implicit encode fails due to non-ASCII characters in the string. This is one of the things Python 3 fixes. – Mark Tolonen May 27 '20 at 06:28

1 Answers1

0

Your log shows the answer:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 6: ordinal not in range(128)

Python 2.7 cannot decode a character in your string because it is a non-ASCII character. The solution here is to work entirely in unicode, or to encode it first then decode it with the proper codec.

Question is possible duplicate of: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

so5user5
  • 325
  • 1
  • 16