0

I am playing with pandas to_json method and cannot quite understand its behaviour.import json

import json
import pandas as pd

FILENAME = 'test.json'

df = pd.DataFrame({6.0: [1.0e3, 1.7e42, 2.0e-4, 1.7e-7], 
                   50.0: [1.034, 1.3e-42, 1.2e17, 0.1]}, 
                  index=[75.0, 19.0, 84.0, 12.0])

df.to_json(FILENAME, double_precision=2)

with open(FILENAME, 'r') as jsonfile:
    jsondata = json.load(jsonfile)
print(json.dumps(jsondata, indent=4))

This prints some numbers in fixed point, and some numbers in exponential notation.

{
    "6.0": {
        "75.0": 1000.0,
        "19.0": 1.7e+42,
        "84.0": 0.0,
        "12.0": 0.0
    },
    "50.0": {
        "75.0": 1.03,
        "19.0": 1.3e-42,
        "84.0": 1.2e+17,
        "12.0": 0.1
    }
}

In particular I cannot get the value at 6.0 - 12.0 to be printed in exponential notation. It is always saved as 0.0, while it really screws up the numerics.

Is there a way to enforce exponential notation for pd.to_json?

Why is it treating 1.7e-7 differently from 1.3e-42?

The boundary seems to lie between the values of exponent of e-15 and e-16. That is 1.7e-15 would be exported as 0.0 while 1.7e-16 would be exported as 1.7e-16. This probably has something to do with the np.float64 representation.

This is really a striking example as it shows that to_json does not preserve monotonicity.

xf = pd.DataFrame({'a': [1.0e-15, 1.0e-16]})
xf.to_json('data.json')
with open('data.json', 'r') as jsonfile:
    jsondata = json.load(jsonfile)
print(json.dumps(jsondata, indent=4))

would print

{
    "a": {
        "0": 0.0,
        "1": 1e-16
    }
}

The value in row 1 is now greater than the value in row 0.

Dima Chubarov
  • 16,199
  • 6
  • 40
  • 76
  • 1
    You might want to look at https://stackoverflow.com/questions/1447287/format-floats-with-standard-json-module. It's got some old information, but it explains the issues. I don't think anything has really changed. – Frank Yellin Aug 15 '23 at 07:19

0 Answers0