I am using ThreadPoolExecutor and giving exact same tasks to workers. The task is to run a jar file and do something with it. the problem I am facing is related to timings.
Case 1: I submit one task to the pool and the worker completes in 8 seconds.
Case 2: I submit same task twice into the pool, and both workers completes around ~10.50 seconds.
Case 3: I submit same task thrice into the pool, and all three workers completes around ~13.38 seconds.
Case 4: I submit same task 4 times into the pool, and all fore workers completes around ~18.88 seconds.
If I replace the workers tasks to time.sleep(8)
(instead of running jar file), then all 4 workers finish at ~8 seconds. Is this because of the fact that, the OS before executing java code has to create java environment first, which the OS is not able to manage it in parallel ?
Can someone explain me why is the execution time increasing for same task, while running in parallel ?Thanks :)
Here is how I am executing the pool;
def transfer_files(file_name):
raw_file_obj = s3.Object(bucket_name='foo-bucket', key=raw_file_name)
body = raw_file_obj.get()['Body']
# prepare java command
java_cmd = "java -server -ms650M -mx800M -cp {} commandline.CSVExport --sourcenode=true --event={} --mode=human_readable --configdir={}" \
.format(jar_file_path, event_name, config_dir)
# Run decoder_tool by piping in the encoded binary bytes
log.info("Running java decoder tool for file {}".format(file_name))
res = run([java_cmd], cwd=tmp_file_path, shell=True, input=body.read(), stderr=PIPE, stdout=PIPE)
res_output = res.stderr.decode("utf-8")
if res.returncode != 0:
if 'Unknown event' in res_output:
log.error("Exception occurred whilst running decoder tool")
raise Exception("Unknown event {}".format(event_name))
log.info("decoder tool output: \n" + res_output)
with futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
# add new task(s) into thread pool
pool.map(transfer_file, ['fileA_for_workerA', 'fileB_for_workerB'])