2

I have python code which reads data as stream (sys.stdin) and then perform some action for each line. Now as volume of data is increasing, i want to split the task among threads and let them work in parallel.

Went through the docs and most of them suggest that threads need to poll (eg from Queue) to get task and work upon it. Here i need to push tasks to these threads.

Any idea/link where i can figure out how to do this ?

for line in sys.stdin:
    //perform some action, which needs to be split among threads
    //action is I/O-bound

One option is that i read from this stream, pipe it to Queue and let thread poll from there.

Mohit Verma
  • 2,019
  • 7
  • 25
  • 33
  • Maybe these help: http://stackoverflow.com/questions/13481276/threading-in-python-using-queue http://stackoverflow.com/questions/11638349/using-multiple-threads-in-python – ρss Apr 29 '14 at 11:24
  • 1
    Note: using threads will *not* give you any increase in speed for a CPU-bound task. If you want to parallelize use multiple processes. See the [`multiprocessing`](https://docs.python.org/2/library/multiprocessing.html) module, in particular the `Pool` object. – Bakuriu Apr 29 '14 at 11:24
  • @Bakuriu. Agree, here tasks are I/O bound, so having multiple thread will solve the problem. Updating question. Thanks. – Mohit Verma Apr 29 '14 at 11:29

1 Answers1

4

Use concurrent.futures (in the stdlib in 3.2, backport available for 2.5+):

from concurrent.futures import ThreadPoolExecutor
import sys

def some_action(line):
    pass # TODO: the actual task

with ThreadPoolExecutor() as executor:
    for line in sys.stdin:
        future = executor.submit(some_action, line)

Note that if the task is computationally intensive, you should use a MultiprocessingPoolExecutor instead of a ThreadPoolExecutor if your Python interpreter is limited by the GIL.

phihag
  • 278,196
  • 72
  • 453
  • 469
  • This seems to be available in version > 3.2, and i am running 2.6 :( ..any other alternative you can suggest ? – Mohit Verma Apr 29 '14 at 11:38
  • @MohitVerma [Here's a backport](https://pypi.python.org/pypi/futures). But you should really update your Python, 2.6 has been unsupported for quite some time now, i.e. it's erroneous and insecure. – phihag Apr 29 '14 at 11:48
  • Thanks. My first experience with any backport. How do we use it ? (have downloaded it) – Mohit Verma Apr 29 '14 at 13:11
  • @MohitVerma A backport is a package like any other. Just use your packaging tool, for example `pip install futures`. That's it. Since concurrent.futures is in pure Python, you can also just [download](https://pypi.python.org/pypi/futures) the package and put it somewhere into your [`PYTHONPATH`](https://docs.python.org/dev/using/cmdline.html#envvar-PYTHONPATH). – phihag Apr 29 '14 at 14:06