Python blocking versus non-blocking OS calls

Question

While writing a web app I am kind of struggling with blocking OS calls and heavy JSON responses which will introduce delay in the app later. What I am thinking of doing is make a call to “Popen" to open a subprocess. Now we know that “Popen" will not block the normal execution, so wherever I need to make a OS call I defer it , instead I make that call much earlier at the start of the app and store the results in a variable. Now when I need the results I check whether the variable is not null and get the response out of it.

This way my app probably will not block on heavy OS calls.

I needed to know is there any other python library/approach out there that is better and if I follow this approach how do I deal with zombie processes effectively.

Other suggestions in this question http://stackoverflow.com/questions/636561/how-can-i-run-an-external-command-asynchronously-from-python — f01, Nov 07 '14 at 09:19

score 0 · Accepted Answer · answered Nov 07 '14 at 11:41

Sure, if you're running heavy queries once for the life of the app then run them all on startup and cache the results. Do this before you begin serving requests and it will never impact your users. Checking when a user requests a resource to see if a value exists in the cache is a 'lazy' or 'deferred' cache and this will still impact the user. Even using Popen you will still need a way to defer responding to the client and yield to other threads.

It sounds like you're writing a raw HTTP server based on BaseHTTPServer or similar? If so you want to take a look at WSGI and choose one of the WSGI compliant servers such as Gunicorn. Combine this with a WSGI framework such as Flask and you will solve your scaling issues without having to resort to Popen and reinventing the wheel.

Gunicorn handles multi-threading your client connections and Flask handles your request state. You may still need to do some work to handle long-running requests but the process will be much easier.

Typically you want to keep your response times short so you don't have to worry about timeouts etc. To this end if you have a long-running process initiated by the user you may want to split the process into three steps.

start_task The user initiates the request and it is submitted to a task queue (check out Celery or Python RQ) returning a tracking ID.
check_task The user provides a tracking ID and the API returns a status.
get_result Once the task is complete the user retrieves the result.

In your web app UI then you can provide the user with feedback at each stage and potentially provide a progress indicator via the 'check_task' call.

Python blocking versus non-blocking OS calls

1 Answers1