Actual concurrency in Google App Engine backend/module instances

Question

Google App Engine offers services like Task Queues and Backends (now Modules) to parallelize handling of requests and doing "concurrent work". Typical fan-in fan-out/fork-join techniques can easily be implemented with Pipelines API, Fantasm etc.

When configuring the hardware of Backends/Modules you choose between B1, B2, B4, B8, but it does not say anything about the number of cores in the CPU configuration. Maybe the number of CPU cores is not relevant here. Backends support spawning "Background Threads" for each incoming request, but Python cannot actually do real concurrent work because of the famous GIL (Global Interpreter Lock).

One frontend instance will handle 8 requests (default, maximum 30) before firing up a new instance.

Python 2.7 with the Threadsafe directive is said to handle incoming request in parallel on one isolated instance, is this correct, or is it only incoming requests that are spread across the independent instances which are done with real concurrency?

On Google App Engine, what is actually performed with real concurrency technically, and on the other side, what is the recommended design pattern gaining most real concurrency and scaling?

You could make a "manual scaling" Backend/Module with 10-20 resident B8 instances with each spawning 10 "out-lived" background threads and doing 10 concurrent async URL fetches at all times for I/O work, or should it be fanned-out with dynamic instance creation?

Concurrent processing can be performed in python if you use async versions of all the api's. Most requests end up using a lot of the vaious api's so there is plenty of opportunity for concurrency, however if you are just performing CPU bound tasks you will get very little concurrency from a single backend or frontend. — Tim Hoffman, Feb 03 '14 at 00:27
What specific approach you take will depend heavily on your processing requirements, and you need to profile to get a better understanding of the most appropriate choice. — Tim Hoffman, Feb 03 '14 at 00:28
A combination of several instances and code utilizing all available async versions of the App Engine APIs will give best real life concurrency across incoming requests. — Fredrik Bertin Fjeld, Mar 12 '14 at 11:56
@TimHoffman, just FYI, the difference between async vs sync APIs is mainly userland. App Engine's request scheduler doesn't care which one you use. [It only looks at current overall CPU load.](http://stackoverflow.com/a/11882719/186123) If current requests are blocked on API calls, regardless of whether they're using the sync or async APIs, the scheduler will happily run a new request on that instance. — ryan, Jul 28 '14 at 18:13

score 1 · Answer 1 · edited May 23 '17 at 12:15

Python 2.7 with the Threadsafe directive is said to handle incoming request in parallel on one isolated instance, is this correct?

Yes, that's correct. It actually does run multiple simultaneous requests on each instance, as opposed to just spreading them across instances. Same with Java and Go (but it sounds like not PHP). It's generally considered a best practice to allow this, since it improves the efficiency of most workloads substantially.

This SO answer has the best details I've seen on how GAE determines whether and when to run requests concurrently.

You're right that Python has a GIL, which limits concurrency across cores to a degree, and for workloads that truly are CPU bound, more than one thread per core doesn't help you much. However, the vast majority of workloads are not CPU bound, especially webapps on platforms like GAE. They're usually I/O bound instead, ie they spend most of their time waiting on the datastore, HTTP fetches to other services, etc. App Engine uses that blocked time to efficiently run other concurrent requests on the same instance.

Actual concurrency in Google App Engine backend/module instances

1 Answers1