8

I'm experiencing occasional Exceeded soft private memory limit error in a wide variety of request handlers in app engine. I understand that this error means that the RAM used by the instance has exceeded the amount allocated, and how that causes the instance to shut down.

I'd like to understand what might be the possible causes of the error, and to start, I'd like to understand how app engine python instances are expected to manage memory. My rudimentary assumptions were:

  1. An F2 instance starts with 256 MB
  2. When it starts up, it loads my application code - lets say 30 MB
  3. When it handles a request it has 226 MB available
    • so long as that request does not exceed 226 MB (+ margin of error) the request completes w/o error
    • if it does exceed 226 MB + margin, the instance completes the request, logs the 'Exceeded soft private memory limit' error, then terminates - now go back to step 1
  4. When that request returns, any memory used by it is freed up - ie. the unused RAM goes back to 226 MB
  5. Step 3-4 are repeated for each request passed to the instance, indefinitely

That's how I presumed it would work, but given that I'm occasionally seeing this error across a fairly wide set of request handlers, I'm now not so sure. My questions are:

a) Does step #4 happen?

b) What could cause it not to happen? or not to fully happen? e.g. how could memory leak between requests?

c) Could storage in module level variables causes memory usage to leak? (I'm not knowingly using module level variables in that way)

d) What tools / techniques can I use to get more data? E.g. measure memory usage at entry to request handler?

In answers/comments, where possible, please link to the gae documentation.

[edit] Extra info: my app is congifured as threadsafe: false. If this has a bearing on the answer, please state what it is. I plan to change to threadsafe: true soon.

[edit] Clarification: This question is about the expected behavior of gae for memory management. So while suggestions like 'call gc.collect()' might well be partial solutions to related problems, they don't fully answer this question. Up until the point that I understand how gae is expected to behave, using gc.collect() would feel like voodoo programming to me.

Finally: If I've got this all backwards then I apologize in advance - I really cant find much useful info on this, so I'm mostly guessing..

tom
  • 2,189
  • 2
  • 15
  • 27
  • Could you have any cyclical references that might be improved by a Weakref being used? – JL Peyret Oct 06 '15 at 23:13
  • First thing I would do is put a gc.collect call at the end of each request handler and then monitor. – Tim Hoffman Oct 10 '15 at 09:49
  • @TimHoffman thanks, but that doesnt actually help me understand how it's *supposed* to work. Maybe there's a memory leak that is making some object ineligible for garbage collection – tom Oct 12 '15 at 20:41
  • Unfortunately without code it's difficult to give you tips. I have been using appengine since 2008 and rarely have I run into memory problems and use ndb and caching extensively. I would follow alex's recommendation around trying ndb with caching turned off. – Tim Hoffman Oct 13 '15 at 10:02
  • @TimHoffman thanks, but unfortunately I'm seeing these errors seemingly randomly distributed across a wide variety of request handlers, so there really isnt any practical way to do that. Hence I'm taking a first principles approach of understanding _expected behavior_, and working from there. I've been using gae since 2009, and havent previously seen this error much. But recently I've been getting approximately 1 per hour, with instances (f2s) serving somewhere between 200 and 1500 request before running out memory. Any tools you recommend for memory profiling? – tom Oct 13 '15 at 18:52
  • remember memory exhaustion could ultimately affect any request, though any request may not be the cause. – Tim Hoffman Oct 14 '15 at 11:11
  • As to memory profiling this post suggests the most current working profiler http://stackoverflow.com/questions/30742104/profiling-memory-usage-on-app-engine – Tim Hoffman Oct 14 '15 at 11:13

3 Answers3

5

App Engine's Python interpreter does nothing special, in terms of memory management, compared to any other standard Python interpreter. So, in particular, there is nothing special that happens "per request", such as your hypothetical step 4. Rather, as soon as any object's reference count decreases to zero, the Python interpreter reclaims that memory (module gc is only there to deal with garbage cycles -- when a bunch of objects never get their reference counts down to zero because they refer to each other even though there is no accessible external reference to them).

So, memory could easily "leak" (in practice, though technically it's not a leak) "between requests" if you use any global variable -- said variables will survive the instance of the handler class and its (e.g) get method -- i.e, your point (c), though you say you are not doing that.

Once you declare your module to be threadsafe, an instance may happen to serve multiple requests concurrently (up to what you've set as max_concurrent_requests in the automatic_scaling section of your module's .yaml configuration file; the default value is 8). So, your instance's RAM will need be a multiple of what each request needs.

As for (d), to "get more data" (I imagine you actually mean, get more RAM), the only thing you can do is configure a larger instance_class for your memory-hungry module.

To use less RAM, there are many techniques -- which have nothing to do with App Engine, everything to do with Python, and in particular, everything to do with your very specific code and its very specific needs.

The one GAE-specific issue I can think of is that ndb's caching has been reported to leak -- see https://code.google.com/p/googleappengine/issues/detail?id=9610 ; that thread also suggests workarounds, such as turning off ndb caching or moving to old db (which does no caching and has no leak). If you're using ndb and have not turned off its caching, that might be the root cause of "memory leak" problems you're observing.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • thanks, that's very useful. So when the the request handler returns (so long as it didnt store anything in global scope), the amount of memory available should be the same as just before that request - i.e. there is no accumulating memory usage from one request to the next. If that's the theory, then I'm going to log `memory_usage().current()` to test whether I'm actually seeing that. Is that the right way to check memory usage of the instance? Also, might some of the gae code be storing on global scope? e.g. the RPC for the DB? I think I saw mention of that elsewhere. – tom Oct 13 '15 at 18:36
  • By "get more data" I actually meant "collect more information on what's going on", which I guess I do with `memory_usage().current()`, but if there are other tools please tell me. Can I assume that when a `Exceeded soft private memory limit` exception is raised, this means that the instance has attempted to do a `gc.collect()`, and therefore that the memory usage of the referencable objects exceeds the memory limit? Or might it be that I need to call gc.collect() at some points to help free up some circular reference memory? – tom Oct 13 '15 at 18:38
  • @tom, if nothing is saving anything globally (including in caches, as I mentioned for ndb), then when all pending requests are done memory use should be back to where it was when they all started. memory_usage is deprecated but I don't know of any alternatives. The `exceeded` is diagnosed from a watcher outside your Python job so it knows nothing about gc.collect calls; the latter migth help only if you do have garbage cycles (object mutually referencing each other). – Alex Martelli Oct 13 '15 at 22:36
  • 1
    I've collected some more data on my app's memory usage, and I'm still seeing memory usage gradually increase from one request to the next. My next step is to hunt for possible causes in my code (global variables, etc). Before I do that, I'd like to know if there are any app engine services/code which cache data in way that cannot be GCd, besides ndb. Do you know of any? Would it be accurate to say that such behavior (if it existed) would be a bug? – tom Mar 26 '16 at 01:27
2

Point 4 is an invalid asumption, Python's garbage collector doesn't return the memory that easily, Python's program is taking up that memory but it's not used until garbage collector has a pass. In the meantime if some other request requires more memory - new might be allocated, on top the memory from the first request. If you want to force Python to garbage collect, you can use gc.collect() as mentioned here

Ritave
  • 1,333
  • 9
  • 25
  • Thanks - that good to know. My app is currently configured as `threadsafe: false`. Given that, I think your scenario of "In the meantime if some other request requires more memory.. " wouldnt happen, right? Given that, is point 4 still invalid? – tom Oct 12 '15 at 20:51
  • 1
    It still is invalid. For example: you finish one request, you stop using objects from the request but they still take the memory until GC pass, that memory is unusable. Another request comes that you need to handle, **before GC pass**. And now you have memory of two requests taken, even with `threadsafe: false`. The GC will free the memory of both later on – Ritave Oct 13 '15 at 09:49
2

Take a look at this Q&A for approaches to check on garbage collection and for potential alternate explanations: Google App Engine DB Query Memory Usage

Community
  • 1
  • 1
Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
  • thanks, but that question actually pertain to memory usage within a single request handler, not between request handlers. Also, see my comment to the accepted answer, the link to code.google, and the discussion there, which does adequately explain that particular issue. – tom Oct 12 '15 at 20:45