I have a python script that performs a very simple task on a huge input file (>10M lines). The script boils down to:
for line in fileinput.input(remainder):
obj=process(line)
print_nicely(obj)
There is no interaction between the lines. But the output needs to be kept in the same order as the input lines.
My attempts to speed things up with multiprocessing like this:
p=mp.Pool(processes=4)
it=p.imap(process,fileinput.input(remainder))
for x in it:
print_nicely(x)
p.close
It appears to make things slower, rather than faster. I assume this is due to overhead of passing the lines / objects between processes.
Is it possible to speed things up for use case/problem or is the overhead of multiprocessing in python just too high for this?