3

I have some code that iterates over DB entities, and runs in a task - see below.

On app engine I'm getting Exceeded soft private memory limit error, and indeed checking memory_usage().current() confirms the problem. See below for output from logging statement. It seems that every time a batch of foos is fetched the memory goes up.

My question is: why is the memory not being garbage collected? I would expect, that in each iteration of of the loops (the while loop, and the for loop, respectively) the re-use of the name foos and the foo would cause the objects to which foos and foo used to point would be 'de-referenced' (i.e. become inaccessible) and therefore become eligible for garbage collection, and then be garbage collected as memory gets tight. But evidently that it not happening.

from google.appengine.api.runtime import memory_usage

batch_size = 10
dict_of_results = {}
results = 0
cursor = None

while True:
  foos = models.Foo.all().filter('status =', 6)
  if cursor:
     foos.with_cursor(cursor)

  for foo in foos.run(batch_size = batch_size):

     logging.debug('on result #{} used memory of {}'.format(results, memory_usage().current()))
     results +=1

     bar  = some_module.get_bar(foo)

     if bar:
        try:
           dict_of_results[bar.baz] += 1
        except KeyError:
           dict_of_results[bar.baz] = 1


     if results >= batch_size:
        cursor = foos.cursor()
        break

  else:
     break   

and in some_module.py

def get_bar(foo):

  for bar in foo.bars:
    if bar.status == 10:
       return bar

  return None  

Output of logging.debug (shortened)

on result #1 used memory of 43
on result #2 used memory of 43
.....
on result #20 used memory of 43
on result #21 used memory of 49
.....
on result #32 used memory of 49
on result #33 used memory of 54
.....
on result #44 used memory of 54
on result #45 used memory of 59
.....
on result #55 used memory of 59
.....
.....
.....

on result #597 used memory of 284.3
Exceeded soft private memory limit of 256 MB with 313 MB after servicing 1 requests total
tom
  • 2,189
  • 2
  • 15
  • 27
  • I believe this is a Python thing and not GAE specific. I think in the past I have explicitly called delete in the loop and this prevented the memory usage from growing so you may want to try that. – new name Oct 01 '15 at 02:08

2 Answers2

3

It looks like your batch solution is conflicting with db's batching, resulting in a lot of extra batches hanging around.

When you run query.run(batch_size=batch_size), db will run the query until completion of the entire limit. When you reach the end of the batch, db will grab the next batch. However, right after db does this, you exit the loop and start again. What this means is that batches 1 -> n will all exist in memory twice. Once for the last queries fetch, once for your next queries fetch.

If you want to loop over all your entities, just let db handle the batching:

foos = models.Foo.all().filter('status =', 6)
for foo in foos.run(batch_size = batch_size):
  results +=1
  bar  = some_module.get_bar(foo)
  if bar:
    try:
      dict_of_results[bar.baz] += 1
    except KeyError:
      dict_of_results[bar.baz] = 1

Or, if you want to handle batching yourself, make sure db doesn't do any batching:

while True:
  foo_query = models.Foo.all().filter('status =', 6)
  if cursor:
    foo_query.with_cursor(cursor)
  foos = foo_query.fetch(limit=batch_size)
  if not foos:
    break

  cursor = foos.cursor()
Patrick Costello
  • 3,616
  • 17
  • 22
  • Thanks Patrick. For those looking for more context, please read the comments #8 on-wards here: https://code.google.com/p/googleappengine/issues/detail?id=12243 tl;dr; for queries that will take more than 2.5 mins, the first method probably will result in a deadline exceed error, but the second method will not. The second method also shows escalating memory usage, but so far it has not exceeded memory limit (more testing needed to see what happens with more data) – tom Oct 01 '15 at 21:20
0

You might be looking in the wrong direction.

Take a look at this Q&A for approaches to check on garbage collection and for potential alternate explanations: Google App Engine DB Query Memory Usage

Community
  • 1
  • 1
Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97