Is there a way to speed up converting streams of strings to json/dictionary using pypy3? I know ujson in python3 could be faster than python3's json, but its not really faster than pypy3's json.loads()
.
More info on what I have, I have a program reading stream of json strings from a subprocess and converting (loading) them using json.loads()
. If I comment out the json load execution line (basically just reading the stdout from the subprocess, it just takes about 60% of the total execution.
So I was thinking using a pool of processes or threads could improve it (maybe at least up to 80% execution time), and perform the conversion in parallel. Unfortunately, it did not do anything. Using multithreads had the same results, and the multiprocess took longer than 1 single process (probably mostly due to the overhead and serialization). Is there any other change I could improve performance with pypy3?
For reference, here's a quick example of code (just reading from some file instead):
import json
import timeit
from multiprocessing.pool import ThreadPool
from multiprocessing import Pool
def get_stdout():
with open("input.txt", "r") as f:
for line in f:
yield line
def convert(line):
d = json.loads(line)
return d
def multi_thread():
mt_pool = ThreadPool(3)
for dict in mt_pool.imap(convert, get_stdout()):
pass
def multi_process():
with Pool(3) as mp_pool:
for dict in mp_pool.imap(convert, get_stdout()):
pass
def regular():
for line in get_stdout():
d = convert(line)
print("regular: ", timeit.repeat("regular()", setup="from __main__ import regular", number=1, repeat=5))
print("multi_thread: ", timeit.repeat("multi_thread()", setup="from __main__ import multi_thread", number=1, repeat=5))
print("multi_process: ", timeit.repeat("multi_process()", setup="from __main__ import multi_process", number=1, repeat=5))
Output:
regular: [5.191860154001915, 5.045155504994909, 4.980729935996351, 5.253822096994554, 5.9532385260026786]
multi_thread: [5.08890142099699, 5.088432839998859, 5.156651658995543, 5.781010364997201, 5.082046301999071]
multi_process: [26.595598744999734, 30.841693959999247, 29.383782051001617, 27.83700947300531, 21.377069750000373]