I was reading a similar thread where the OP wanted to process each line in a function using multiprocessing (found here). The answer to this question that was intriguing was the following:
from multiprocessing import Pool
def process_line(line):
return "FOO: %s" % line
if __name__ == "__main__":
pool = Pool(4)
with open('file.txt') as source_file:
# chunk the work into batches of 4 lines at a time
results = pool.map(process_line, source_file, 4)
I'm wondering if you can do the same, but instead of returning each line processed, write it into another file.
Basically I want to see if there is a way to MP reading and writing a file in order to split it up by lines. Say I want 100,000 lines per file.
from multiprocessing import Pool
def write_lines(line):
#need method to write lines to multiple files, perhaps a Queue?
if __name__ == "__main__":
#all my procs
pool = Pool()
with open('file.txt') as source_file:
# chunk the work into batches of 4 lines at a time
results = pool.map(process_line, source_file, 100000)
I could use a MP Queue to split up the file into separate Queue objects, then fill each processor with a job of writing out all the lines, but I still have to read through the file first. So will it always be completely IO bound and never be able to be MP in an efficient way?