1

I have a formula which takes approximately 0.5s to calculate. However, I need this calculation to be executed 1 million times with different values. An example of the formula (simplified):

y = a + b

In which I have 1 million combinations of a and b which all need to be calculated. These 1 million combinations are saved in a list called combinations. I work with Python.

My idea is to spin up an AWS instance for every 100.000 calculations, so in this case I'll need 10. Then the idea is to divide the combinations list into 10 pieces (part1 = combinations[:100000] etc.). Then I have to send every AWS instance the subset of combinations.

But how can I do this best? My idea was to have a shared volume that is accessible for all instances and on that volume I put the calculate.py script which I call via SSH:

ssh user@instance python calculcate.py

Or is celery maybe a better way to do this? Or maybe another way?

Edit: I did some testing and Celery seems the way to go.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Peter
  • 1,658
  • 17
  • 23
  • To me, this sounds like it might an excellent opportunity to rewrite the function in C and call this from Python. Speedups of 10, 100, or 1000 times can often result. Sometimes speedup is very minimal though, it all depends up the details of you function. There are a number of ways to do this. You might also try PyPy which can give significant performance advantages and is low-effort since all you need in a different Python interpreter. – Gary Walker Oct 27 '14 at 15:32
  • @GaryWalker thanks for your reply. It is a linear regression (statsmodels OLS) which is not supported by Pypy, so I prefer to keep working with CPython. – Peter Oct 27 '14 at 15:47

1 Answers1

0

You could use pathos to set up a ssh-tunnel, then submit the function to several servers using the pathos fork of parallelpython -- or just use the tunnel from pathos and use something else like rpyc or zmq to connect to the different servers through the tunnel.

See: Python Multiprocessing with Distributed Cluster

Community
  • 1
  • 1
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139