The answers to asynchronous slower than synchronous did not cover the scenario I was working on, hence this question.
I'm using Python 3.6.0 on Windows 10 to read 11 identical JSON files named k80.json
to k90.json
, 18.1 MB each.
First, I tried synchronous, sequential reading of all 11 files. It took 5.07s to complete.
from json import load
from os.path import join
from time import time
def read_config(fname):
json_fp = open(fname)
json_data = load(json_fp)
json_fp.close()
return len(json_data)
if __name__ == '__main__':
NUM_THREADS = 12
idx = 0
in_files = [join('C:\\', 'Users', 'userA', 'Documents', f'k{idx}.json') for idx in range(80, 91)]
print('Starting sequential run.')
start_time1 = time()
for fname in in_files:
print(f'Reading file: {fname}')
print(f'The JSON file size is {read_config(fname)}')
read_duration1 = round(time() - start_time1, 2)
print('Ending sequential run.')
print(f'Synchoronous reading took {read_duration1}s')
print('\n' * 3)
Result
Starting sequential run.
Reading file: C:\Users\userA\Documents\k80.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k81.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k82.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k83.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k84.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k85.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k86.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k87.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k88.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k89.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k90.json
The JSON file size is 5
Ending sequential run.
Synchoronous reading took 5.07s
Next, I tried running this using a ThreadPoolExecutor
with the map
function call, using 12 threads. That took 5.69s.
from concurrent.futures import ThreadPoolExecutor
from json import load
from os.path import join
from time import time
def read_config(fname):
json_fp = open(fname)
json_data = load(json_fp)
json_fp.close()
return len(json_data)
if __name__ == '__main__':
NUM_THREADS = 12
idx = 0
in_files = [join('C:\\', 'Users', 'userA', 'Documents', f'k{idx}.json') for idx in range(80, 91)]
th_pool = ThreadPoolExecutor(max_workers=NUM_THREADS)
print(f'Starting mapped pre-emptive threaded pool run with {NUM_THREADS} threads.')
start_time2 = time()
with th_pool:
map_iter = th_pool.map(read_config, in_files, timeout=10)
read_duration2 = round(time() - start_time2, 2)
print('The JSON file size is ')
map_results = list(map_iter)
for map_res in map_results:
print(f'The JSON file size is {map_res}')
print('Ending mapped pre-emptive threaded pool run.')
print(f'Mapped asynchoronous pre-emptive threaded pool reading took {read_duration2}s')
print('\n' * 3)
Result
Starting mapped pre-emptive threaded pool run with 12 threads.
The JSON file size is
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
Ending mapped pre-emptive threaded pool run.
Mapped asynchoronous pre-emptive threaded pool reading took 5.69s
Finally, I tried running this using a ThreadPoolExecutor
with the submit
function call, using 12 threads. That took 5.73s.
from concurrent.futures import ThreadPoolExecutor
from json import load
from os.path import join
from time import time
def read_config(fname):
json_fp = open(fname)
json_data = load(json_fp)
json_fp.close()
return len(json_data)
if __name__ == '__main__':
NUM_THREADS = 12
idx = 0
in_files = [join('C:\\', 'Users', 'userA', 'Documents', f'k{idx}.json') for idx in range(80, 91)]
th_pool = ThreadPoolExecutor(max_workers=NUM_THREADS)
results = []
print(f'Starting submitted pre-emptive threaded pool run with {NUM_THREADS} threads.')
start_time3 = time()
with th_pool:
for fname in in_files:
results.append(th_pool.submit(read_config, fname))
read_duration3 = round(time() - start_time3, 2)
for result in results:
print(f'The JSON file size is {result.result(timeout=10)}')
print('Ending submitted pre-emptive threaded pool run.')
print(f'Submitted asynchoronous pre-emptive threaded pool reading took {read_duration3}s')
Result
Starting submitted pre-emptive threaded pool run with 12 threads.
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
Ending submitted pre-emptive threaded pool run.
Submitted asynchoronous pre-emptive threaded pool reading took 5.73s
Questions
Why does synchronous reading perform faster than threading when reading reasonably large JSON files like these? I was expecting threading to be faster given the file size and number of files being read.
Are JSON files of much larger sizes than these required for threading to perform better than synchronous reading? If not, what are the other factors to consider?
I thank you in advance for your time and help.
Post-Script
Thanks to the answers below, I changed the read_config
method slightly to introduce a 3s sleep delay (simulating an IO-wait operation), and now the threaded versions really shine (38.81s vs 9.36s and 9.39s).
def read_config(fname):
json_fp = open(fname)
json_data = load(json_fp)
json_fp.close()
sleep(3) # Simulate an activity that waits on I/O.
return len(json_data)
Results
Starting sequential run.
Reading file: C:\Users\userA\Documents\k80.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k81.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k82.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k83.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k84.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k85.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k86.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k87.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k88.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k89.json
The JSON file size is 5
Reading file: C:\Users\userA\Documents\k90.json
The JSON file size is 5
Ending sequential run.
Synchoronous reading took 38.81s
Starting mapped pre-emptive threaded pool run with 12 threads.
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
Ending mapped pre-emptive threaded pool run.
Mapped asynchoronous pre-emptive threaded pool reading took 9.36s
Starting submitted pre-emptive threaded pool run with 12 threads.
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
The JSON file size is 5
Ending submitted pre-emptive threaded pool run.
Submitted asynchoronous pre-emptive threaded pool reading took 9.39s