3

I need to update a number of instances of class Foo on Google App Engine Datastore using ndb.

Here's what I have so far:

while more:
    foo_instances, more_cursor, more = Foo.query().fetch_page(
        20, start_cursor=more_cursor)
    for foo in foo_instances:
        bar = foo.bar.get()  # foo.bar is a Key to a Bar instance.
        bar.updated = True

    ndb.put_multi(foo_instances)

and (tasklet friendly):

foo_iterator = Foo.query().iter()
while (yield foo_iterator.has_next_async()):
    foo = foo_iterator.next()
    bar = foo.bar.get()  # foo.bar is a Key to a Bar instance.
    bar.updated = True

    yield bar.put_async()

I'm planning to execute this code in a Push Queue task which I believe to have a 10 minute window before timing out.

Which one is the correct approach (if any) to execute the task and avoid timeout or memory issues? There are a few thousands of instances of type Foo.

martincho
  • 4,517
  • 7
  • 32
  • 42
  • side note: in the 1st solution you probably want to track and `put_multi` the `bar` instances, not the `foo_instances`, right? – Dan Cornilescu Feb 17 '17 at 23:04

1 Answers1

0

If you plan to use the push queue why not split the work in smaller pieces (by your cursor size, for example) and have each piece handled by a different task? This way you shouldn't have scalability issues and thus be free to pick whichever solution you desire/prefer.

Something along the lines of the solution discussed in Google appengine: Task queue performance (but replace the deferred library with the push queue).

Community
  • 1
  • 1
Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97