101

Currently I have this dictionary, printed using pprint:

{'AlarmExTempHum': '\x00\x00\x00\x00\x00\x00\x00\x00',  
'AlarmIn': 0,  
'AlarmOut': '\x00\x00',  
'AlarmRain': 0,  
'AlarmSoilLeaf': '\x00\x00\x00\x00',  
'BarTrend': 60,  
'BatteryStatus': 0,  
'BatteryVolts': 4.751953125,  
'CRC': 55003,
'EOL': '\n\r',
'ETDay': 0,
'ETMonth': 0,
'ETYear': 0,
'ExtraHum1': None,
'ExtraHum2': None,
'ExtraHum3': None,
'ExtraHum4': None,
'ExtraHum5': None,
'ExtraHum6': None,
'ExtraHum7': None,
'ExtraTemp1': None,
'ExtraTemp2': None,
'ExtraTemp3': None,
'ExtraTemp4': None,
'ExtraTemp5': None,
'ExtraTemp6': None,
'ExtraTemp7': None,
'ForecastIcon': 2,
'ForecastRuleNo': 122,
'HumIn': 31,
'HumOut': 94,
'LOO': 'LOO',
'LeafTemps': '\xff\xff\xff\xff',
'LeafWetness': '\xff\xff\xff\x00',
'NextRec': 37,
'PacketType': 0,
'Pressure': 995.9363359295631,
'RainDay': 0.0,
'RainMonth': 0.0,
'RainRate': 0.0,
'RainStorm': 0.0,
'RainYear': 2.8,
'SoilMoist': '\xff\xff\xff\xff',
'SoilTemps': '\xff\xff\xff\xff',
'SolarRad': None,
'StormStartDate': '2127-15-31',
'SunRise': 849,
'SunSet': 1611,
'TempIn': 21.38888888888889,
'TempOut': 0.8888888888888897,
'UV': None,
'WindDir': 219,
'WindSpeed': 3.6,
'WindSpeed10Min': 3.6}

When I do this:

import json
d = (my dictionary above)
jsonarray = json.dumps(d)

I get this error: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
HyperDevil
  • 2,519
  • 9
  • 38
  • 52

4 Answers4

169

If you are fine with non-printable symbols in your json, then add ensure_ascii=False to dumps call.

>>> json.dumps(your_data, ensure_ascii=False)

If ensure_ascii is false, then the return value will be a unicode instance subject to normal Python str to unicode coercion rules instead of being escaped to an ASCII str.

kmerenkov
  • 2,859
  • 1
  • 19
  • 7
  • 1
    add `indent=n` to the options to pretty print, where `n` is the number of spaces to indent – RTF Jul 16 '17 at 08:41
18

ensure_ascii=False really only defers the issue to the decoding stage:

>>> dict2 = {'LeafTemps': '\xff\xff\xff\xff',}
>>> json1 = json.dumps(dict2, ensure_ascii=False)
>>> print(json1)
{"LeafTemps": "����"}
>>> json.loads(json1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 328, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

Ultimately you can't store raw bytes in a JSON document, so you'll want to use some means of unambiguously encoding a sequence of arbitrary bytes as an ASCII string - such as base64.

>>> import json
>>> from base64 import b64encode, b64decode
>>> my_dict = {'LeafTemps': '\xff\xff\xff\xff',} 
>>> my_dict['LeafTemps'] = b64encode(my_dict['LeafTemps'])
>>> json.dumps(my_dict)
'{"LeafTemps": "/////w=="}'
>>> json.loads(json.dumps(my_dict))
{u'LeafTemps': u'/////w=='}
>>> new_dict = json.loads(json.dumps(my_dict))
>>> new_dict['LeafTemps'] = b64decode(new_dict['LeafTemps'])
>>> print new_dict
{u'LeafTemps': '\xff\xff\xff\xff'}
rkday
  • 1,106
  • 8
  • 12
  • You could pass arbitrary binary data (inefficiently) in json [using 'latin1' encoding](http://ideone.com/VrOtxm) – jfs Feb 02 '13 at 12:21
  • 1
    You could, I suppose, but json is designed/intended to use utf-8. – Karl Knechtel Feb 02 '13 at 13:55
  • 2
    @J.F.Sebastian: Indeed, _very_ inefficient as compared to `b64encode`. For example, for the 256 character string `s = ''.join(chr(i) for i in xrange(256))`, `len(json.dumps(b64encode(s))) == 346` vs `len(json.dumps(s.decode('latin1'))) == 1045`. – martineau Feb 02 '13 at 15:45
10

If you use Python 2, don't forget to add the UTF-8 file encoding comment on the first line of your script.

# -*- coding: UTF-8 -*-

This will fix some Unicode problems and make your life easier.

KPrince36
  • 2,886
  • 2
  • 21
  • 29
justice
  • 306
  • 2
  • 12
2

One possible solution that I use is to use python3. It seems to solve many utf issues.

Sorry for the late answer, but it may help people in the future.

For example,

#!/usr/bin/env python3
import json
# your code follows
Ralph Yozzo
  • 1,086
  • 13
  • 25
  • 4
    Sure, you're damn right, Python 3 solved many encoding issues. But that's not the answer to that question. It is explicitly tagged with python-2.7. So what you are saying is something like this: There is no built-in vacuum cleaner in your old car. So please buy a new car instead of adding a vacuum cleaner in your old car. – colidyre May 01 '17 at 00:24