I have a huge file with hundreds of thousands of lines. I need to run the same process on each line. My plan was to make several threads to speed up the process. Whenever I multithreaded before I used treading and Queue modules. However I cannot figure out how to apply a queue. What I really need to do is read the file line by line, as the file is too large to do the whole thing. I thought that maybe I could just add one thing to the queue at a time with .put(), then immediately pass it to the thread, but it seems like if I did this the threads could conflict. Any suggestions?
Asked
Active
Viewed 294 times
0
-
1Reread the docs for the `threading` and `Queue` modules. They are designed for exactly this situation. I suggest using `multiprocessing` instead of `threading`, as the task will not likely run any faster when split between multiple threads in the same process. Also, there is the distinct possibility that this task is IO bound rather than processor bound, in which case it won't matter how you split the task up as all the reading will occur in one process. – Steven Rumbalski May 30 '12 at 21:35
-
Threads probably won't speed it up because of the [GIL](http://wiki.python.org/moin/GlobalInterpreterLock), you probably want to use [multiprocessing](http://docs.python.org/library/multiprocessing.html) instead, look specifically at `pool.map`. – agf May 30 '12 at 21:35
-
[examples that demonstrate multiprocessing, threading, concurrent.futures](http://stackoverflow.com/a/9874484/4279) – jfs May 30 '12 at 23:40
1 Answers
0
How much processing is there per line.
If not a lot then you might slow things down with multiple threads contending for the device the file is on? You might want to split the file beforehand and put the components on different devices? Then it's a simple matter of firing up a process per file or per group of files.
I'd use the split, xargs -P unix commands for this

pixelbeat
- 30,615
- 9
- 51
- 60