7

I'm having trouble encoding infinity in json.

json.dumps will convert this to "Infinity", but I would like it do convert it to null or another value of my choosing.

Unfortunately, setting default argument only seems to work if dumps does't already understand the object, otherwise the default handler appears to be bypassed.

Is there a way I can pre-encode the object, change the default way a type/class is encoded, or convert a certain type/class into a different object prior to normal encoding?

cammil
  • 9,499
  • 15
  • 55
  • 89

4 Answers4

6

Look at the source here: http://hg.python.org/cpython/file/7ec9255d4189/Lib/json/encoder.py

If you subclass JSONEncoder, you can override just the iterencode(self, o, _one_shot=False) method, which has explicit special casing for Infinity (inside an inner function).

To make this reusable, you'll also want to alter the __init__ to take some new options, and store them in the class.

Alternatively, you could pick a json library from pypi which has the appropriate extensibility you are looking for: https://pypi.python.org/pypi?%3Aaction=search&term=json&submit=search

Here's an example:

import json

class FloatEncoder(json.JSONEncoder):

    def __init__(self, nan_str = "null", **kwargs):
        super(FloatEncoder,self).__init__(**kwargs)
    self.nan_str = nan_str

    # uses code from official python json.encoder module.
    # Same licence applies.
    def iterencode(self, o, _one_shot=False):
        """Encode the given object and yield each string
        representation as available.

        For example::

            for chunk in JSONEncoder().iterencode(bigobject):
                mysocket.write(chunk)
        """
        if self.check_circular:
            markers = {}
        else:
            markers = None
        if self.ensure_ascii:
            _encoder = json.encoder.encode_basestring_ascii
        else:
            _encoder = json.encoder.encode_basestring
        if self.encoding != 'utf-8':
            def _encoder(o, _orig_encoder=_encoder,
                         _encoding=self.encoding):
                if isinstance(o, str):
                    o = o.decode(_encoding)
                return _orig_encoder(o)

        def floatstr(o, allow_nan=self.allow_nan,
                     _repr=json.encoder.FLOAT_REPR,
                     _inf=json.encoder.INFINITY,
                     _neginf=-json.encoder.INFINITY,
                     nan_str = self.nan_str):
            # Check for specials. Note that this type of test is 
            # processor and/or platform-specific, so do tests which
            # don't depend on the internals.

            if o != o:
                text = nan_str
            elif o == _inf:
                text = 'Infinity'
            elif o == _neginf:
                text = '-Infinity'
            else:
                return _repr(o)

            if not allow_nan:
                raise ValueError(
                    "Out of range float values are not JSON compliant: " +
                    repr(o))

            return text

        _iterencode = json.encoder._make_iterencode(
                markers, self.default, _encoder, self.indent, floatstr,
                self.key_separator, self.item_separator, self.sort_keys,
                self.skipkeys, _one_shot)
        return _iterencode(o, 0)


example_obj = {
    'name': 'example',
    'body': [
        1.1,
        {"3.3": 5, "1.1": float('Nan')},
        [float('inf'), 2.2]
    ]}

print json.dumps(example_obj, cls=FloatEncoder)

ideone: http://ideone.com/dFWaNj

naught101
  • 18,687
  • 19
  • 90
  • 138
Marcin
  • 48,559
  • 18
  • 128
  • 201
  • The given snippet does not work in Python3.6. Gives `AttributeError: 'FloatEncoder' object has no attribute 'encoding'` – AbdealiLoKo Oct 25 '18 at 08:29
  • @AbdealiJK Python 3.6 was released 3.5 years after this code was written. https://www.python.org/downloads/release/python-360/ – Marcin Oct 26 '18 at 16:00
  • Ah, didn't check the dates ^_^ seems like I'll have to figure something else out as I do need to maintain 3.3+ support for my app – AbdealiLoKo Oct 27 '18 at 17:16
5

No, there is no simple way to achieve this. In fact, NaN and Infinity floating point values shouldn't be serialized with json at all, according to the standard. Python uses an extension of the standard. You can make the python encoding standard-compliant passing the allow_nan=False parameter to dumps, but this will raise a ValueError for infinity/nans even if you provide a default function.

You have two ways of doing what you want:

  1. Subclass JSONEncoder and change how these values are encoded. Note that you will have to take into account cases where a sequence can contain an infinity value etc. AFAIK there is no API to redefine how objects of a specific class are encoded.

  2. Make a copy of the object to encode and replace any occurrence of infinity/nan with None or some other object that is encoded as you want.

A less robust, yet much simpler solution, is to modify the encoded data, for example replacing all Infinity substrings with null:

>>> import re
>>> infty_regex = re.compile(r'\bInfinity\b')
>>> def replace_infinities(encoded):
...     regex = re.compile(r'\bInfinity\b')
...     return regex.sub('null', encoded)
... 
>>> import json
>>> replace_infinities(json.dumps([1, 2, 3, float('inf'), 4]))
'[1, 2, 3, null, 4]'

Obviously you should take into account the text Infinity inside strings etc., so even here a robust solution is not immediate, nor elegant.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • The necessary alteration is actually pretty simple. – Marcin Jul 06 '13 at 16:29
  • It's probably easier to make a robust solution by replacing Infinity *before* JSON encoding. That way you can just check for equality with `float("inf")`. – jwg Aug 22 '17 at 07:34
0

Context

I ran into this issue and didn't want to bring an extra dependency into the project just to handle this case. Additionally, my project supports Python 2.6, 2.7, 3.3, and 3.4 and user's of simplejson. Unfortunately there are three different implementations of iterencode between these versions, so hard-coding a particular version was undesirable.

Hopefully this will help someone else with similar requirements!

Qualifiers

If the encoding time/processing-power surrounding your json.dumps call is small compared to other components of your project, you can un-encode/re-encode the JSON to get your desired result leveraging the parse_constant kwarg.

Benefits

  • It doesn't matter if the end-user has Python 2.x's json, Python 3.x's json or is using simplejson (e.g, import simplejson as json)
  • It only uses public json interfaces which are unlikely to change.

Caveats

  • This will take ~3X as long to encode things
  • This implementation doesn't handle object_pairs_hook because then it wouldn't work for python 2.6
  • Invalid separators will fail

Code

class StrictJSONEncoder(json.JSONEncoder):

    def default(self, o):
        """Make sure we don't instantly fail"""
        return o

    def coerce_to_strict(self, const):
        """
        This is used to ultimately *encode* into strict JSON, see `encode`

        """
        # before python 2.7, 'true', 'false', 'null', were include here.
        if const in ('Infinity', '-Infinity', 'NaN'):
            return None
        else:
            return const

    def encode(self, o):
        """
        Load and then dump the result using parse_constant kwarg

        Note that setting invalid separators will cause a failure at this step.

        """

        # this will raise errors in a normal-expected way
        encoded_o = super(StrictJSONEncoder, self).encode(o)

        # now:
        #    1. `loads` to switch Infinity, -Infinity, NaN to None
        #    2. `dumps` again so you get 'null' instead of extended JSON
        try:
            new_o = json.loads(encoded_o, parse_constant=self.coerce_to_strict)
        except ValueError:

            # invalid separators will fail here. raise a helpful exception
            raise ValueError(
                "Encoding into strict JSON failed. Did you set the separators "
                "valid JSON separators?"
            )
        else:
            return json.dumps(new_o, sort_keys=self.sort_keys,
                              indent=self.indent,
                              separators=(self.item_separator,
                                          self.key_separator))
theengineear
  • 857
  • 6
  • 6
  • This does not seem to work for `dump()`... This gives a `null`: `json.dumps(float('nan'), cls=CustomJSONEncoder)` but This gives a `NaN`: `json.dump(float('nan'), open('/tmp/a', 'w'), cls=CustomJSONEncoder)`. Because dump() uses `iterencode` and dumps uses `encode()` – AbdealiLoKo Oct 25 '18 at 08:25
-1

You could do something along these lines:

import json
import math

target=[1.1,1,2.2,float('inf'),float('nan'),'a string',int(2)]

def ffloat(f):
    if not isinstance(f,float):
        return f
    if math.isnan(f):
        return 'custom NaN'
    if math.isinf(f):
        return 'custom inf'
    return f

print 'regular json:',json.dumps(target)      
print 'customized:',json.dumps(map(ffloat,target))     

Prints:

regular json: [1.1, 1, 2.2, Infinity, NaN, "a string", 2]
customized: [1.1, 1, 2.2, "custom inf", "custom NaN", "a string", 2]

If you want to handle nested data structures, this is also not that hard:

import json
import math
from collections import Mapping, Sequence

def nested_json(o):
    if isinstance(o, float):
        if math.isnan(o):
            return 'custom NaN'
        if math.isinf(o):
            return 'custom inf'
        return o
    elif isinstance(o, basestring):
        return o
    elif isinstance(o, Sequence):
        return [nested_json(item) for item in o]
    elif isinstance(o, Mapping):
        return dict((key, nested_json(value)) for key, value in o.iteritems())
    else:
        return o    

nested_tgt=[1.1,{1.1:float('inf'),3.3:5},(float('inf'),2.2),]

print 'regular json:',json.dumps(nested_tgt)      
print 'nested json',json.dumps(nested_json(nested_tgt))

Prints:

regular json: [1.1, {"3.3": 5, "1.1": Infinity}, [Infinity, 2.2]]
nested json [1.1, {"3.3": 5, "1.1": "custom inf"}, ["custom inf", 2.2]]
dawg
  • 98,345
  • 23
  • 131
  • 206
  • 1
    This won't really be useful, unless dealing with a flat structure. – Marcin Jul 06 '13 at 16:28
  • Now you're writing your own json encoder. At this point, it would really make more sense to just subclass JSONEncoder, or use another library. – Marcin Jul 06 '13 at 19:05
  • @Marcin: This is hardly a full json encoder. And the code works. You would have the subclass of JSONEncoder intercept `iterencode` -- fine -- Python 2.7 does not use `iterencode` and it would not work. Show some working code! – dawg Jul 06 '13 at 19:13
  • Yes it does. Look at line 212 in this file: http://hg.python.org/cpython/file/7ec9255d4189/Lib/json/encoder.py . The problem is that you are half-writing your own encoder. You lose the robustness and convenience of an existing library in order to write a bunch of your own code. – Marcin Jul 06 '13 at 19:20
  • 1
    Still waiting to see working code. That prints single flat and nested data structures. [It is harder than you think](http://stackoverflow.com/questions/1447287/format-floats-with-standard-json-module) – dawg Jul 06 '13 at 21:00
  • I'm not here to write the code for anyone. It's really very simple, though. – Marcin Jul 06 '13 at 21:03
  • Perhaps it'll mollify you to know that I was the second person to vote your answer up. – Marcin Jul 06 '13 at 21:08
  • @Marcin: Excuse me if I was rude. Sorry. :-) Selfishly, I wanted to see how you did it because I could not make it work myself. When I create the class with `iterencode` the class is given the entire nested structure -- not each element. So you are no better off. As far as I can see, you still have to flatten the whole structure. – dawg Jul 06 '13 at 21:58
  • Just use the code which `iterencode` already uses, with only your own necessary modifications. It's open source. – Marcin Jul 06 '13 at 22:30
  • 1
    I can't see how that is any easier than just iterating over the data structure to look for the floats in question. The code you pointed to is a dense 60 lines and you would have version issues (potentially) when new version of JSON encoder are issued. Why is that easier? – dawg Jul 07 '13 at 04:32
  • Actually, I really like this solution. Appeals to my desire for simplicity [of logic]. I have no idea how to use `iterencode` so that could be the source of my bias. – cammil Jul 07 '13 at 10:10
  • I've posted some working code. You'll see it's basically just a copy and paste from the original python code. That's why it's easier. – Marcin Jul 07 '13 at 10:24