So, I'm trying to write an application that uses django as its ORM, since it'll both need to do some behind the scenes processing and an easy to use front-end. It's core functionality will be processing data that's in the database, in a high-CPU process (basically monte carlo simulations) and I want to implement multiprocessing, specifically using Pool (I get 4 processes). Basically my code runs like this, with about 20 children of the parent:
assorted import statements to get the django environment in the script
from multiprocessing import Pool
from random import random
from time import sleep
def test(child):
x=[]
print child.id
for i in range(100):
print child.id, i
x.append(child.parent.id) #just to hit the DB
return x
if __name__ == '__main__':
parent = Parent.objects.get(id=1)
pool = Pool()
results = []
results = pool.map(test,parent.children.all())
pool.close()
pool.join()
print results
With the code as such, I get intermittent DatabaseError
s or PicklingError
s. The former are usually of the form "malformed database" or "lost connection to MySQL server", the latter are usually "cannot pickle model.DoesNotExist". They are random, occur with any process, and of course there is nothing wrong with the DB itself. If I set pool = Pool(proccesses=1)
then it runs, in a single thread just fine. I also throw in various print statements to make sure that most of them are actually running.
I also have been changing test
to:
def test(child):
x=[]
s= random()
sleep(random())
for i in range(100):
x.append(child.parent.id)
return x
Which just makes each iteration pause less than a second before running, and it makes everything fine. If I get the random interval down to about 500ms it starts acting up. So, probably a concurrency problem, right? But with only 4 processes hitting. My question is how do I solve this without making large dumps of the data ahead of time? I have tested it with both SQLite and MySQL, and both are having trouble with this.