1

I am setting up a HTTP proxy in python to filter web content. I found a good example on StackOverflow which does exactly this using Twisted. However, I need another proxy to access the web. So, the proxy needs to forward requests to another proxy. What is the best way to do this using twisted.web.proxy?

I found a related question which needs something similar, but from a reverse proxy.

My best guess is that it should be possible to build a chained proxy by modifying or subclassing twisted.web.proxy.ProxyClient to connect to the next proxy instead of connecting to the web directly. Unfortunately I didn't find any clues in the documentation on how to do this.

The code I have so far (cite):

from twisted.python import log
from twisted.web import http, proxy

class ProxyClient(proxy.ProxyClient):
    def handleResponsePart(self, buffer):
        proxy.ProxyClient.handleResponsePart(self, buffer)

class ProxyClientFactory(proxy.ProxyClientFactory):
    protocol = ProxyClient

class ProxyRequest(proxy.ProxyRequest):
    protocols = dict(http=ProxyClientFactory)

class Proxy(proxy.Proxy):
    requestFactory = ProxyRequest

class ProxyFactory(http.HTTPFactory):
    protocol = Proxy

portstr = "tcp:8080:interface=localhost"  # serve on localhost:8080

if __name__ == '__main__':
    import sys
    from twisted.internet import endpoints, reactor

    log.startLogging(sys.stdout)
    endpoint = endpoints.serverFromString(reactor, portstr)
    d = endpoint.listen(ProxyFactory())
    reactor.run()
Community
  • 1
  • 1
wau
  • 830
  • 7
  • 20

1 Answers1

0

This is actually not hard to implement using Twisted. Let me give you a simple example.

Suppose the first proxy is proxy1.py, like the code you pasted in your question; the second proxy is proxy2.py.

For proxy1.py, you just need to override the process function of class ProxyRequest. Like this:

class ProxyRequest(proxy.ProxyRequest):
    def process(self):
        parsed = urllib_parse.urlparse(self.uri)
        protocol = parsed[0]
        host = parsed[1].decode('ascii')
        port = self.ports[protocol]
        if ':' in host:
            host, port = host.split(':')
            port = int(port)
        rest = urllib_parse.urlunparse((b'', b'') + parsed[2:])
        if not rest:
            rest = rest + b'/'
        class_ = self.protocols[protocol]
        headers = self.getAllHeaders().copy()
        if b'host' not in headers:
            headers[b'host'] = host.encode('ascii')
        self.content.seek(0, 0)
        s = self.content.read()
        clientFactory = class_(self.method, rest, self.clientproto, headers, s, self)
        if (NeedGoToSecondProxy):
            self.reactor.connectTCP(your_second_proxy_server_ip, your_second_proxy_port, clientFactory)
        else:
            self.reactor.connectTCP(host, port, clientFactory)

For proxy2.py, you just need to set up another simple proxy. A problem need to be noticed though, you may need to override process function in proxy2.py again, because the self.uri may not be valid after the proxy forward (chain).

For example, the original self.uri should be http://www.google.com/something?para1=xxx, and you may find it as /something?para1=xxx only, at second proxy. So you need to extract the host info from self.headers and complement the self.uri so that your second proxy can normally deliver it to the correct destination.

yi-ji
  • 427
  • 5
  • 8