2

I am writing a python object with json.dump.

But I want to only write objects that would not exceed 10KB file size.

How can estimate the size of an object before writing?

Exploring
  • 2,493
  • 11
  • 56
  • 97

2 Answers2

1

Here's my take on it.

We start with the following sample (taken from here, with a little extra added to make it interesting):

sample_orig = """{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    },
    "a little extra" : "∫ßåøπœ®†"
}"""

Next, we define a test function to perform the encoding and output the size:

def encode_sample(sample : str):
    for encoding in ('ascii', 'utf8', 'utf16'):
        filename = f'{encoding}.json'
        encoded_sample = sample.encode(encoding=encoding, errors='replace')
        with FileIO(filename, mode='wb') as f:
            f.write(encoded_sample)
            assert len(encoded_sample) == f.tell()
            print(f'{encoding}: {f.tell()} bytes')

The assert will prove that the size reported by len is the same if we're dealing with a bytes (not str). If not, the call will raise AssertionError.

We'll encode the original sample first:

encode_sample(sample_orig)

Output:

ascii: 617 bytes
utf8: 627 bytes
utf16: 1236 bytes

Next, we run it through json.loads() and json.dumps() to "optimise" the size (i.e. remove unnecessary whitespace):

sample_reduced = json.dumps(json.loads(sample_orig))
encode_sample(sample_reduced)

Output:

ascii: 455 bytes
utf8: 455 bytes
utf16: 912 bytes

Remarks:

  • The OP asked "[…] writing a python object with json.dump", so the "optimisation" by removing whitespace doesn't really matter, but I left it in as it might benefit others.

  • Encoding matters. ascii and utf8 (the default) will result in the same filesize, if the output only contains ASCII characters. Because I added a little extra at the end of the JSON, the file sizes for both encodings differ. And utf16 will of course be the largest of the three.

  • As stated before, you can use len to get the size of the object if you encode it first.

DocZerø
  • 8,037
  • 11
  • 38
  • 66
-2

Convert the JSON to a string, then use sys.getsizeof(). It returns the size in bytes, so you can divide by 1024 if you want to compare it to a threshold value in kilobytes.

sys.getsizeof(json.dumps(object))

Sample usage:

import json
import sys
x = '{"name":"John", "age":30, "car":null}'
y = json.loads(x)
print(sys.getsizeof(json.dumps(y))) # 89

Edit:
As mentioned in this thread, objects have a higher size in memory. So subtract 49 from the size to get a better estimate.

print(sys.getsizeof(json.dumps(y)) - sys.getsizeof(""))
Abhinav Mathur
  • 7,791
  • 3
  • 10
  • 24