How to read multiple large files in parallel using python multiprocessing?

Question

I have a set of large text files. Currently, I have a generator function which parse each file sequentially and yield a value for each line.

def parse(): 
    for f in files:
        for line in f:
            # parse line
            yield value

It takes 24 hours to iterate over all files! I'm interested to know is it possible to read multiple files in parallel and yield results in an efficient way?

What takes so much time? Is it reading the files or processing the files? If it's processing (CPU bound) then you **might** improve your time to about `x/cpu_count` by using `multiprocessing`. Please read up on multiprocessing and threading and share some code. You will be more likely to get a response I think. *edit - [see here](http://stackoverflow.com/a/2069556/377366) you don't want to be reading multiple files at the same time. Rather you want to read each one and send it off to be processed. — KobeJohn, Oct 11 '15 at 12:52
Yes it is possible, it depends on what you are doing with the data — Padraic Cunningham, Oct 11 '15 at 12:57
I'm voting to close this one as it's basically identical to the linked question. If you aren't able to get it done with the answers on that question, then please try to make some code to do what you want and ask a more specific question with the code you have made. — KobeJohn, Oct 11 '15 at 12:58
It seems likes it's the parsing that's taking long, maybe you should look at that part as well. — Leb, Oct 11 '15 at 12:59

How to read multiple large files in parallel using python multiprocessing?

0 Answers0