I have a formula which takes approximately 0.5s to calculate. However, I need this calculation to be executed 1 million times with different values. An example of the formula (simplified):
y = a + b
In which I have 1 million combinations of a
and b
which all need to be calculated. These 1 million combinations are saved in a list called combinations
. I work with Python.
My idea is to spin up an AWS instance for every 100.000 calculations, so in this case I'll need 10. Then the idea is to divide the combinations
list into 10 pieces (part1 = combinations[:100000]
etc.). Then I have to send every AWS instance the subset of combinations.
But how can I do this best? My idea was to have a shared volume that is accessible for all instances and on that volume I put the calculate.py
script which I call via SSH:
ssh user@instance python calculcate.py
Or is celery maybe a better way to do this? Or maybe another way?
Edit: I did some testing and Celery seems the way to go.