0

So I have a few machines on the network running MongoDB:

  • I can easily write code to connect to one from my PC and return a result set, e.g.:
from pymongo import Connection
c = Connection("10.130.10.12")
some_data = c.MyData.MyCollection.find_one()
  • If I have, say 100 servers to connect to, and want to put this in a loop, that's easy too:
all_data = []
for server in my_list_of_servers:
    c = Connection(server)
    all_data.append(c.MyData.MyCollection.find_one())
  • However this does it one-by-one and could be quite slow.
  • How can I send out all the requests at once? I'm super unfamiliar with threading (is that what I should even be looking into?)
LittleBobbyTables
  • 4,361
  • 9
  • 38
  • 67

1 Answers1

2
from multiprocessing import Pool

def connectAndCollect(server):
    c = Connection(server)
    return c.MyData.MyCollection.find_one()

pool = Pool(processes=10)
res = pool.map(connectAndCollect,servers)
map(lambda x: all_data.append(x),res)
pool.close()

The multiprocessing library is designed for this sort of task. The final map call can be replaced by a for loop if you like.

A description of using the multiprocessing module for Map/Reduce tasks in general is described here: http://mikecvet.wordpress.com/2010/07/02/parallel-mapreduce-in-python/

alexplanation
  • 1,468
  • 14
  • 18
  • Thanks! But pool.map can't accept this function? PicklingError: Can't pickle : attribute lookup __builtin__.func tion failed – LittleBobbyTables Jan 09 '13 at 19:24
  • 1
    **EDIT** Because I am foolish, I suggested something impossible: http://stackoverflow.com/questions/4827432/how-to-let-pool-map-take-a-lambda-function Don't use lambdas in `pool.map`. This final accumulation step won't really benefit from the multiprocessing anyway. – alexplanation Jan 09 '13 at 22:28
  • woo! It works ^_^ What is `res` short for though? Also, I'm more familiar with list comprehensions than maps - could this line: `map(lambda x: all_data.append(x),res)` be replaced this list comprehension: `all_data.append([x for x in res])` . Are they equivalent in function/memory? (sorry! so many questions ) – LittleBobbyTables Jan 10 '13 at 16:41
  • 1
    `res` is just a generic result variable. The list comprehension will not work in this case, because instead of appending each item, you will append the list itself. If you want a list comprehension, you can try `all_data.extend([x for x in res])` - just make sure everything is initialized properly and check things with test cases first. – alexplanation Jan 10 '13 at 17:01