If you have a multicore machine, and can use Python 3.2 (instead of Python 2), this would be a good use case for the concurrent.futures
new feature in Python 3.2 -
depending on the processing you need to do with each line. If you require that the processing is done in the file order, you'd probably have to worry about reassembling the output later on.
Otherwise, using concurrent.futures can schedule each client to be processed in a different task with little effort. What is the output you have to generate on that?
If you think you would not profit from parallelizing each line's contents, the most obvious way is the best way to do: that is, what you have just done.
This example divides the processing in up to 12 sub-process, each one executing Python 's built-in len
function. Replace len
for a function that receives the line as a parameter and performs whatever you need to process on that line:
from concurrent.futures import ProcessPoolExecutor as Executor
with Executor(max_workers=5) as ex:
with open("poeem_5.txt") as fl:
results = list(ex.map(len, fl))
The "list" call is needed to force the mapping to be done within the "with" statement. If you don't need a scalar value for each line, but rather to record a result in a file you can do it in a for loop instead:
for line in fl:
ex.submit(my_function, line)