1

How can I speed up loading json strings to python dictionary through stdout/subprocess?

For reference, lets take a script mysubprocess.py

import json
import tempfile


JSON_SIZE = 250000
NUM_JSON = 20


d = [{'a': x, 'b': x, 'c': x, 'd': x, 'e': x, 'f': x} for x in range(JSON_SIZE)]
d = json.dumps(d)


for x in range(NUM_JSON):
    ## Method1
    # print(d)

    ## Method2 and 3
    tf = tempfile.NamedTemporaryFile(delete=False, mode="w")
    tf.write(d)
    tf.seek(0)
    print(tf.name)

in test.py

import json
import subprocess
import mmap


proc = subprocess.Popen(['python', 'mysubprocess.py'],
                        stdout=subprocess.PIPE,
                        stderr=subprocess.PIPE)


for line in proc.stdout.readlines():
    ## Method1
    # d = json.loads(line)

    ## Method2
    # line = line.strip()
    # with open(line, "r") as f:
    #     d = json.loads(f.read())

    ## Method3
    line = line.strip()
    with open(line, "r") as f:
        with mmap.mmap(f.fileno(), length=0, access=mmap.ACCESS_READ) as mmap_obj:
            d = json.loads(mmap_obj.read())

Where method1 is just printing the entire json strings into stdout and reading all the string in the main process. Method2 is saving the json string to a file and reading the file in the main process. Method3 is similar to method2 but loading to memory map.

Executing these in pypy3, I get:

Method1: 5.2s
Method2: 5.4s
Method3: 4.8s

Is there a more efficient way to get this done? Is there a way to use mmap and share that address instead of writing to a file then loading it into memory?

user1179317
  • 2,693
  • 3
  • 34
  • 62
  • 1
    If you have control over mysubprocess.py, try disable the output buffer of it (at least add flush to that print(). This should help improve the performance of method1. Some discussions in https://stackoverflow.com/questions/107705/disable-output-buffering may be helpful. – lavin Nov 24 '21 at 10:14
  • multiprocessing.shared_memory https://docs.python.org/3/library/multiprocessing.shared_memory.html rather than writting to a file or std out you write to a shared memory block, then read it from the other process. – Louis Ricci Nov 24 '21 at 22:13
  • Dont you have to specify a size? – user1179317 Nov 25 '21 at 06:21

0 Answers0