0

I have to implement a Python web application which operates on data available through web-services (API with GET requests and JSON responses).

The API server is implemented in Java. Preliminarily tests show significant overhead if API calls are made through urllib2 (connection is opened and closed for each request).

If I enable AJP in API server, which library should I use to perform requests using AJP protocol from Python? I googled Plup, but I can't find a clear way to request and consume data in Python, not just proxying it elsewhere.

Is using AJP a good solution? Obviously I have to maintain a connection pool to perform AJP requests, but I can't find anything related in Plup.

Thank you.

ilya b.
  • 443
  • 5
  • 9

1 Answers1

1

I have no idea what's AJP is. Also you did not open what goes to "sigfinicant overhead", so I might be a poor person to answer to this question.

But if I were you I would first try to few tricks:

Enable HTTP 1.1 keep-alive on urllib2

(here is an example using another library Python urllib2 with keep alive )

HTTP 1.1 keep-alive connections do not close TCP/IP pipe for the subsequent requests.

Use Spawning / eventlets web server which does non-blocking IO patch for urllib / Python sockets.

http://pypi.python.org/pypi/Spawning/

This will make parallelization in Python much more robust, when the overhead in the application is input/output, not using CPU to process the requests. JSON decoding is rarely CPU bound.

With these two tricks we were able to consume 1000 request/sec in our Python web application from Microsoft IIS backed API server (farm).

Community
  • 1
  • 1
Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
  • Thank you for the answer. I will try to enable keep-alive first. I found the lib which does exactly what I need. http://code.google.com/p/urllib3/ – ilya b. Aug 28 '11 at 22:42
  • 2
    AJP is a protocol implemented by most of Java web servers to enable request proxying from the frontent server such as apache or lighthttpd. – ilya b. Aug 28 '11 at 22:44
  • Under the overhead I mean the difference between time needed to access the resource from HTTP server located near in the network and the time needed to get the same resource from the same location in the network, with measured access time and persistent connection (such as Mongo's db). On my setup up to 50 ms wasted to establish and close each connection in the single-threaded mode. – ilya b. Aug 28 '11 at 22:52
  • I believe TCP/IP latency is the issue HTTP keep alive should cut it. HTTP connections are more universal than any protocol which has word "Java" in it, so steer away from it unless you really need it :) – Mikko Ohtamaa Aug 29 '11 at 01:03
  • If you really need fast binary protocol it would be Google Protocol Buffers, but let's hope you don't need to go there yet as it will significantly increase the complexity of the implementation – Mikko Ohtamaa Aug 29 '11 at 01:04