12

I need to write a proxy like program in Python, the work flow is very similar to a web proxy. The program sits in between the client and the server, incept requests sent by the client to the server, process the request, then send it to the original server. Of course the protocol used is a private protocol uses TCP.

To minimize the effort, I want to use Python Twisted to handle the request receiving (the part acts as a server) and resending (the part acts as a client).

To maximum the performance, I want to use python multiprocessing (threading has the GIL limit) to separate the program into three parts (processes). The first process runs Twisted to receive requests, put the request in a queue, and return success immediately to the original client. The second process take request from the queue, process the request further and put it to another queue. The 3rd process take request from the 2nd queue and send it to the original server.

I was a new comer to Python Twisted, I know it is event driven, I also heard it's better to not mix Twisted with threading or multiprocessing. So I don't know whether this way is appropriate or is there a more elegant way by just using Twisted?

benhengx
  • 123
  • 1
  • 1
  • 6
  • I suspect using 3 processes instead of 1 is redundant. The 1st and the last process don't do any CPU-bound processing anyway... they are just IO bound. By actually putting everything into a single Twisted reactor process with no threads and only async/event-base logic, you will probably achieve a more maintainable and better performing result. – Erik Kaplun Mar 22 '12 at 13:43

3 Answers3

19

Twisted has its own event-driven way of running subprocesses which is (in my humble, but correct, opinion) better than the multiprocessing module. The core API is spawnProcess, but tools like ampoule provide higher-level wrappers over it.

If you use spawnProcess, you will be able to handle output from subprocesses in the same way you'd handle any other event in Twisted; if you use multiprocessing, you'll need to develop your own queue-based way of getting output from a subprocess into the Twisted mainloop somehow, since the normal callFromThread API that a thread might use won't work from another process. Depending on how you call it, it will either try to pickle the reactor, or just use a different non-working reactor in the subprocess; either way it will lose your call forever.

Glyph
  • 31,152
  • 11
  • 87
  • 129
  • This seems like the answer I expected, thank you Glyph. However I'm still in learning Twisted, and not quite understand what the terms you mentioned like 'spawnProcess' or 'callFromThread' from Twisted mean, I'll come back to let you know when I begin to code. – benhengx Apr 20 '11 at 09:12
  • 1
    `spawnProcess` was always a link, but now `callFromThread` is too. They will take you to the Twisted API documentation, and hopefully that will make it clear what they're about :). – Glyph May 16 '11 at 13:14
  • 1
    more approachable documentations than the api docs: [Using Processes](https://twistedmatrix.com/documents/current/core/howto/process.html) – jfs May 09 '14 at 02:01
4

ampoule is the first thing I think when reading your question.

It is a simple process pool implementation which uses the AMP protocol to communicate. You can use the deferToAMPProcess function, it's very easy to use.

nosklo
  • 217,122
  • 57
  • 293
  • 297
0

You can try something like Cooperative Multitasking technique as it's described there http://us.pycon.org/2010/conference/schedule/event/73/ . It's simillar to technique as Glyph menitioned and it's worth a try.

You can try to use ZeroMQ with Twisted but it's really hard and experimental for now :)

kkszysiu
  • 537
  • 2
  • 8