3

I'm trying to parse some json in python, and I'm making use of NaN. Unfortunately, my source writes NaN as follows:

{ "foo": nan }

This actually isn't so uncommon; in python one does float('nan') to get an NaN, and C++ outputs nan from an NaN double value. Unfortunately, I can't seem to figure out how to make python parse this. I put this in a file called bar.txt and tried the following:

def foo(s):
    print "hello"
    if s == 'nan' or s == 'NaN':
        return float('nan')
    else:
        return float(s)

def bar(s):
    print "blah"    

with open("bar.txt") as f:
    x = json.load(f, parse_float=foo, parse_constant=bar)

I get some backtrace followed by: ValueError: No JSON object could be decoded. Neither hello nor blah get printed, which indicates to me that neither of my callbacks are actually being called to deal with this case.

Is there any way to do this nicely?

Nir Friedman
  • 17,108
  • 2
  • 44
  • 72
  • What happens when you do `json.load` without your functions? – Patrick Haugh Dec 13 '16 at 19:53
  • The `parse_float` argument won't be invoked, because *that's not a float*. If you don't quote it out, that isn't sufficiently valid JSON to get as far as your parsers. What is generating that string? If you had `NaN` it would `loads` just fine. – jonrsharpe Dec 13 '16 at 19:55
  • JSON is not the same as Python syntax. You should fix whatever is creating the file so that it produces valid JSON. – Barmar Dec 13 '16 at 20:02
  • @jonrsharpe Another piece of code, written in C++, which prints doubles into json without special casing. In both python and C++, the print representation of an `NaN` number is `nan`. Which is why I'm surprised there's no easy way to do this. – Nir Friedman Dec 13 '16 at 20:02
  • The other code should use a JSON library to print it, instead of using C++'s built-in output functions. – Barmar Dec 13 '16 at 20:03
  • @Barmar Clearly if fixing that code would be easy, I would do that instead of posting here, or even trying to do this in the first place. Also, note that there is no `NaN` at all in valid JSON. – Nir Friedman Dec 13 '16 at 20:03
  • According to `help(json.loads)`, the only constants it accepts are `-Infinity`, `Infinity`, `NaN`, `null`, `true` and `false`. – jonrsharpe Dec 13 '16 at 20:05
  • Related: http://stackoverflow.com/questions/1423081/json-left-out-infinity-and-nan-json-status-in-ecmascript – Barmar Dec 13 '16 at 20:05
  • I just tried your code and got it working for NaN. `json.loads(x, parse_constant=foo)`. But lowercase 'nan' still remains invalid JSON ... – Maurice Meyer Dec 13 '16 at 20:11

1 Answers1

4

Is there any way to do this nicely?

No, there is no way to do this using only the documented json interface. If you examine json/scanner.py, you can see that the string NaN is hardcoded into the lexical analysis and cannot be replaced.

Depending upon the precise nature of your data, you may be able to use a regular expression to solve your problem.

import json
import re

j = '{"Number": nan}'
j = re.sub(r'\bnan\b', 'NaN', j)

print json.loads(j)
Robᵩ
  • 163,533
  • 20
  • 239
  • 308