-1

I have a GAE project written in Python. I made a cron to execute a batch operation. But it hit a soft private memory limit of F1 instance, which is 124MB after a few iterations. Could anyone help me to write this code more efficiently, hopefully within 124MB. len(people) should be less than 500.

def cron():
    q = Account.all().filter('role =', 1)
    people = [e for e in q]
    for p in people:
        s = Schedule.available(p)
        m = ScheduleMapper(s).as_dict()
        memcache.set('key_for_%s' % p.key(), m)

This is dev server and I don't want to upgrade my instance class. Plus, I want to avoid using third party libraries, such as numpy and pandas.

I added a garbage collection in the last line of for loop. But it doesn't seem to be working.

del s
m.clear()
import gc
gc.collect()
steve
  • 127
  • 12
  • You haven't mentioned how many entities you are retrieving.Also I would move the looping to a function and the gc outside. When you saying iterations do you mean invocations of the cron handler or the outer or inner loop ? – Tim Hoffman Oct 02 '17 at 10:40
  • Thank you for the comment! The number of entities should be less than 500. As for iterations, I meant inner loop. – steve Oct 02 '17 at 10:57
  • how many on the inner loop. Without seeing your models I suspect your holding some references somewhere. – Tim Hoffman Oct 02 '17 at 14:06
  • I didn't count how many precisely but ended in a few. Yeah, I was suspecting the reference as well. But how to release all the memories? – steve Oct 02 '17 at 14:13
  • From what I've seen, del list or dict.clear() won't release all the memories. – steve Oct 02 '17 at 14:16
  • I think we need to seem some details in available() and look at stream lining what you are doing with out all the intermediate objects. Look at using map to apply functions to entities rather than resolving the whole list and then iterating over it. Things like that. I have processes with queries that work thousands of entities and my F1 instances last for days and never fail with out of memory errors. – Tim Hoffman Oct 03 '17 at 10:04

1 Answers1

0

To see if it's even possible to fit it into the memory footprint you want modify your query to get a single entity and check if you can execute successfully the for loop for that one entity. Or just add a break at the end of the for loop :)

If that doesn't work you need to upgrade your instance class.

If the experiment works then you can use split the work using Query Cursors into multiple push queue tasks, each processing only one entity or just a few of them.

Maybe take a look at Google appengine: Task queue performance for a discussion about splitting the work in multiple tasks (though the reason for splitting in that case was exceeding the request deadline, not the memory limit).

Note that even when using multiple tasks it's still possible to hit the memory limit (see App Engine Deferred: Tracking Down Memory Leaks), but at least the work would get done even if a particular instance is restarted (tasks are retried by default).

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97