I have a function where I'm reading lines from stdin and using itertools.groupby
to group lines and then do some processing. My code looks like this:
from itertools import groupby
from operator import itemgetter
def func(lines):
for key, group in groupby(lines, key=itemgetter(0)):
lst = list(group)
results = my_cpu_intensive_function(lst)
# send results to stdout for further processing
print(results)
def main():
# a generator holding lists of data
lines = (line.strip().split('\t') for line in sys.stdin)
func(lines)
Everything works how I want it too, however, my my_cpu_intensive_function()
is very CPU-intensive. How can I parallelize this function to speed up my code? I was looking at multiprocessing.Pool()
, but I couldn't figure out how to use it or if it was the right tool for the job.