1

Say I have two queues, call them colors and numbers:

colors = Queue.Queue()
numbers = Queue.Queue()

and they each contain several items:

for color in ['red', 'orange', 'yellow', 'green', 'blue', 'indago', 'violet']:
    colors.put(color)
for i in xrange(20):
    numbers.put(i)

and a function to handle a combination of a number and letter:

def handle():
    while not colors.empty()
        color = colors.get()
        number = numbers.get()
        print "Foo: %s bar: %d" % (color, number)
        colors.task_done()
        numbers.task_done()

that will be threaded:

children = []
for i in xrange(num_threads):
    children.append(threading.Thread(target=handle))

but instead of just printing each color and a number, I want to print all possible combinations of colors and numbers, what is the most efficient way to do this? Here is what I would like the output to look like: http://pastebin.com/yhksKswr

The problem (well feature I suppose) is that Queue.get() removes the item it returns from the queue so that each item is only used once.

735Tesla
  • 3,162
  • 4
  • 34
  • 57
  • Use [`itertools`](https://docs.python.org/2/library/itertools.html). The problem you describe seems independent of `Queue`. – Brian Cain Jun 14 '14 at 21:44
  • @BrianCain the reason I am using `Queue` is so that I don't have to worry about thread safety. I know how to do this with lists, but not queues. – 735Tesla Jun 14 '14 at 21:46

2 Answers2

2

One approach you might take is to fill a single queue with tuples, each tuple holding a 'color' and a 'number'. In other words, generate the cartesian product of the two lists as an initial step, then hand them out to your threaded workers via the thread-safe Queue.

BTW, in my experience, parallelizing at the process level gets you more bang for your buck than threading in Python. You might try using Redis or Celery to distribute your jobs across many workers (executing on the same or different machines).

Community
  • 1
  • 1
cbare
  • 12,060
  • 8
  • 56
  • 63
1

It seems this may be too sequential and small to be threaded. The key to a problem being solved with a parallel algorithm is that there has to be a way to break it into subproblems of approximately equal size, each subproblem is reasonably computationally intensive (so as to make the overhead in creating a new thread worth doing), and that no solution to a previous subproblem is needed to solve another subproblem (because then one thread is left doing nothing waiting on another).

You're going to have to keep track of what the current color and number are and iterate over both of them, so it could look something like this:

for color in colors:
  for number in numbers:
    t = threading.Thread(target=make_combination, args=(color, number))
    t.run()

def make_combination(c, n):
  # make a combination

But since it takes so long to create a thread, you would be better off if you just called make_Combination in the loop.

If you really want to do it with threads, I would:

  1. Initialize the Queue with all the colors and numbers`.
  2. Create n threads.
  3. Let each thread get a color, copy the numbers Queue, then print the color with each number.
  4. Each thread repeats #3 until the color queue is empty.

So:

for color in ['red', 'orange', 'yellow', 'green', 'blue', 'indago', 'violet']:
    colors.put(color)
numbers = list(range(20)) # We won't be using it like a queue, so just make it a list.

for i in range(0, num_threads):
  threading.Thread(target=handle)

def handle():
  while not colors.empty():
    color = colors.get()
    for i in numbers:
      print color, i # Edit this to get it to print what you want

But it's important to not that this will almost never print in order.

And with multiprocessing.Pool:

# Initialize as lists
colors = [...]
numbers = [...]

def handle(c, n):
  # do something with c and n

p = multiprocessing.Pool(num_processes)
for c in colors:
  for n in numbers:
    p.apply_async(handle, (c, n)) # it's either this or "p.apply_async(handle, args = (c, n))". Can't remember.
    # The above basically means "call handle(c, n) in another process". There are ways to get the return value, too, if you want it. (See the docs about Pool and AsyncResult.)

p.close() # No more jobs to submit.
p.join() # Wait for jobs to finish.
Tyler
  • 1,818
  • 2
  • 13
  • 22
  • Thanks for the answer. The actual program I'm writing does actually proform a more computational intensive process with each combination, I only used the colors and numbers to simplify the question. Also, your solution provides no way to control the number of threads created. Were there 1000 numbers it would attempt to create severl thousand threads. – 735Tesla Jun 14 '14 at 21:54
  • Oh, well if you actually want concurrency, then you need multiprocessing. Threads in Python only run one at a time, because of the GIL. Multiple processes bypass the GIL and give you true SMP. It even has a `Pool` class which will handle the problem of limiting the number of processes to a given amount. Are you interested in seeing an example using `Pool`? – Tyler Jun 14 '14 at 22:01
  • Never heard of that module before, but it sounds shiny, so yes! – 735Tesla Jun 14 '14 at 22:03
  • Out of curiosity, is it possibly to do the same thing with an object that extends `threading.Thread`? – 735Tesla Jun 14 '14 at 22:17
  • 1
    AFAIK, it isn't in the library. I did a cursory Google search just out of curiosity. Regardless, if you want to actually use more than one processor core at a time, you need `multiprocessing` instead of `threading` anyway. If you use threads, you'll notice that they only use one core on your machine no matter how many threads you create. Processes, like those created with the `multiprocessing` module, will use as many cores as you want it too (bounded by how many you have, of course). – Tyler Jun 14 '14 at 22:23
  • 1
    That isn't to say you couldn't write a ThreadPool class. But I don't see the point in it. Sorry to crush your idea, but I, like most people, see threads in Python as completely useless, because of the GIL (Google it). It would be great if they would fix the GIL, but it doesn't seem like that will ever happen. – Tyler Jun 14 '14 at 22:26
  • Thanks for all of your help and suggestions. I might actually end up switching to c or c++ for the power they give me – 735Tesla Jun 14 '14 at 22:31
  • In that case, you may be able to use this simple thread pool library I wrote for C++11: https://github.com/Tyler-Hardin/thread_pool. – Tyler Jun 15 '14 at 18:45