I'm having trouble executing asynchronous GAE NDB datastore queries. For testing purposes, suppose I execute
l = []
for i in range (0,50):
qry = NDBNetLoc.query(NDBNetLoc.netloc == 'imdb.com').get()
l.append(qry)
Where netloc is an indexed property of my model (no other properties; I'm fixing the string here but in general the strings queried will be unique). This of course makes a "waterfall" in requests:
https://i.stack.imgur.com/H7sBQ.jpg
This is the canonical type of request that should be async'd (according to Google). So instead, I execute
futs = []
for i in range (0,50):
qry = NDBNetLoc.query(NDBNetLoc.netloc == 'imdb.com').get_async()
futs.append(qry)
for fut in futs:
l.append(fut.get_result())
But I don't see an improvement. While each request fires one after another, each call takes much longer (decreasing in duration as i
increases)
https://i.stack.imgur.com/Cm0Ks.jpg
Additionally, it seems like the requests aren't firing until the second for loop. If I add a time.sleep(2)
in front of this loop, I get something like
https://i.stack.imgur.com/CaGIe.jpg
which is very confusing to me, as I thought the requests fire as soon as the Future object is created. So my two questions are 1) Why aren't the requests firing at the moment the each future object is instantiated and 2) why does each request now take much longer (to the point where doing this asynchronously or synchronously is equivalent terms of time to finish)?
EDIT:
I should add that the reason I'm not doing one simple IN query for a list of unique netloc
s is because because I eventually run more complicated queries (i.e. get all Model2's that have propertyA = foo and an ancestor that is a particular NDBNetLoc returned from the first query, for many NDBNetLocs and many foo's).