How can I speed up loading json strings to python dictionary through stdout/subprocess?
For reference, lets take a script mysubprocess.py
import json
import tempfile
JSON_SIZE = 250000
NUM_JSON = 20
d = [{'a': x, 'b': x, 'c': x, 'd': x, 'e': x, 'f': x} for x in range(JSON_SIZE)]
d = json.dumps(d)
for x in range(NUM_JSON):
## Method1
# print(d)
## Method2 and 3
tf = tempfile.NamedTemporaryFile(delete=False, mode="w")
tf.write(d)
tf.seek(0)
print(tf.name)
in test.py
import json
import subprocess
import mmap
proc = subprocess.Popen(['python', 'mysubprocess.py'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
for line in proc.stdout.readlines():
## Method1
# d = json.loads(line)
## Method2
# line = line.strip()
# with open(line, "r") as f:
# d = json.loads(f.read())
## Method3
line = line.strip()
with open(line, "r") as f:
with mmap.mmap(f.fileno(), length=0, access=mmap.ACCESS_READ) as mmap_obj:
d = json.loads(mmap_obj.read())
Where method1
is just printing the entire json strings into stdout and reading all the string in the main process. Method2
is saving the json string to a file and reading the file in the main process. Method3
is similar to method2 but loading to memory map.
Executing these in pypy3, I get:
Method1: 5.2s
Method2: 5.4s
Method3: 4.8s
Is there a more efficient way to get this done? Is there a way to use mmap and share that address instead of writing to a file then loading it into memory?