0

I'm having trouble executing asynchronous GAE NDB datastore queries. For testing purposes, suppose I execute

l = []
for i in range (0,50):
    qry = NDBNetLoc.query(NDBNetLoc.netloc == 'imdb.com').get()
    l.append(qry)

Where netloc is an indexed property of my model (no other properties; I'm fixing the string here but in general the strings queried will be unique). This of course makes a "waterfall" in requests:

https://i.stack.imgur.com/H7sBQ.jpg

This is the canonical type of request that should be async'd (according to Google). So instead, I execute

futs = []
for i in range (0,50):
    qry = NDBNetLoc.query(NDBNetLoc.netloc == 'imdb.com').get_async()
    futs.append(qry)

for fut in futs:
    l.append(fut.get_result())

But I don't see an improvement. While each request fires one after another, each call takes much longer (decreasing in duration as i increases)

https://i.stack.imgur.com/Cm0Ks.jpg

Additionally, it seems like the requests aren't firing until the second for loop. If I add a time.sleep(2) in front of this loop, I get something like

https://i.stack.imgur.com/CaGIe.jpg

which is very confusing to me, as I thought the requests fire as soon as the Future object is created. So my two questions are 1) Why aren't the requests firing at the moment the each future object is instantiated and 2) why does each request now take much longer (to the point where doing this asynchronously or synchronously is equivalent terms of time to finish)?

EDIT: I should add that the reason I'm not doing one simple IN query for a list of unique netlocs is because because I eventually run more complicated queries (i.e. get all Model2's that have propertyA = foo and an ancestor that is a particular NDBNetLoc returned from the first query, for many NDBNetLocs and many foo's).

thegeebe
  • 625
  • 1
  • 6
  • 17
  • 1
    **Suggestion that might or might not be applicable to your case**: if as you're saying your strings are unique you could use them as entity keys and do a `get_multi` which would return all 50 results in ONE RPC request rather than 50. – Mihail Russu Oct 29 '14 at 16:54
  • This is very applicable to the first type of query I'm doing. Thank you thank you! For some of my more complicated queries (those querying on properties that are not unique) I will still have to use the normal way. I think I will eventually move over to java but because of time constraints I will use `get_multi` and eat the inefficiency in my other query – thegeebe Oct 29 '14 at 18:20

1 Answers1

-2

I found a summary of this problem on SO... I'm not really familiar with the Global Interpreter Lock and how it applies here but I suppose the code that decodes my RPC's is CPython bytecode, and the interpreter is not thread safe so each call locks it. Guess I'll move over to Java.....

AppEngine Query.fetch_async not very asynchronous?

Community
  • 1
  • 1
thegeebe
  • 625
  • 1
  • 6
  • 17