9

Recently I have been playing around with the HTTP Proxy in twisted. After much trial and error I think I finally I have something working. What I want to know though, is how, if it is possible, do I expand this proxy to also be able to handle HTTPS pages? Here is what I've got so far:

from twisted.internet import reactor
from twisted.web import http
from twisted.web.proxy import Proxy, ProxyRequest, ProxyClientFactory, ProxyClient



class HTTPProxyClient(ProxyClient):
    def handleHeader(self, key, value):
        print "%s : %s" % (key, value)
        ProxyClient.handleHeader(self, key, value)

    def handleResponsePart(self, buffer):
        print buffer
        ProxyClient.handleResponsePart(self, buffer)

class HTTPProxyFactory(ProxyClientFactory):
    protocol = HTTPProxyClient

class HTTPProxyRequest(ProxyRequest):
    protocols = {'http' : HTTPProxyFactory}

    def process(self):
        print self.method
        for k,v in self.requestHeaders.getAllRawHeaders():
            print "%s : %s" % (k,v)
        print "\n \n"

        ProxyRequest.process(self)

class HTTPProxy(Proxy):

    requestFactory = HTTPProxyRequest


factory = http.HTTPFactory()
factory.protocol = HTTPProxy

reactor.listenSSL(8001, factory)
reactor.run()

As this code demonstrates, for the sake of example for now I am just printing out whatever is going through the connection. Is it possible to handle HTTPS with the same classes? If not, how should I go about implementing such a thing?

themaestro
  • 13,750
  • 20
  • 56
  • 75
  • How do you intend to handle the issue of server certificate trust? – MattH Jun 25 '10 at 14:10
  • MattH, I am collaborating with another programmer for this project and according to him he's already figured out how to get the SSL cert into twsited. Apparently you can run: reactor.listenSSL(port, factory, cert) and you can listen to HTTPS connections. Though, if you are knowledgeable about the topic I'd definitely appreciate any input! – themaestro Jun 25 '10 at 14:53
  • 2
    As Marcus Adams points out. You have the issue of certificate trust. When a webbrowser is configured to use a proxy for HTTPS, it sends a "connect host:port" and expects to be passed-through to the server. The proxy involved will only see the encrypted SSL traffic that it is brokering between client and server. If you wanted to magically masquerade as the server to the client, then you'd have to get the client to trust your certificate for the purposes of accessing the website the client is trying to visit. – MattH Jun 25 '10 at 15:36

2 Answers2

15

If you want to connect to an HTTPS website via an HTTP proxy, you need to use the CONNECT HTTP verb (because that's how a proxy works for HTTPS). In this case, the proxy server simply connects to the target server and relays whatever is sent by the server back to the client's socket (and vice versa). There's no caching involved in this case (but you might be able to log the hosts you're connecting to).

The exchange will look like this (client to proxy):

C->P: CONNECT target.host:443 HTTP/1.0
C->P:

P->C: 200 OK
P->C: 

After this, the proxy simply opens a plain socket to the target server (no HTTP or SSL/TLS yet) and relays everything between the initial client and the target server (including the TLS handshake that the client initiates). The client upgrades the existing socket it has to the proxy to use TLS/SSL (by starting the SSL/TLS handshake). Once the client has read the '200' status line, as far as the client is concerned, it's as if it had made the connection to the target server directly.

Bruno
  • 119,590
  • 31
  • 270
  • 376
  • I have hard time to understand how you are doing this exactly. I get the whole process but when it really comes down to it, I do not understand how "proxy simply opens a plain socket to the target server". Client posts CONNECT, then server answers back 200 AND opens server socket. This is easy. Now whatever client sends is sent via that socket but I do not know how to do this. How do I track to which socket a client is going to send something when there are multiple connections? How do I "assign" a client connection which pumps data to proxy to the already opened socket to the target server? – stewenson Jun 16 '15 at 15:08
  • @stewenson the server have got the client connection when the client send CONNECT request, so after the server reply 200 OK, just do anything raw socket can do with the tcp connection. – schemacs Nov 30 '15 at 15:34
2

I'm not sure about twisted, but I want to warn you that if you implement a HTTPS proxy, a web browser will expect the server's SSL certificate to match the domain name in the URL (address bar). The web browser will issue security warnings otherwise.

There are ways around this, such as generating certificates on the fly, but you'd need the root certificate to be trusted on the browser.

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • This would be true for a reverse application-layer proxy, or a transparent proxy. The question doesn't specify what kind of proxy he wants for what purpose. – MattH Jun 25 '10 at 14:29
  • To Clarify: To start I would just like to write an HTTPS proxy that can merely listen to all traffic going over the connection and print/log it. Example: Client --> makes request to SSL encrypted site --> proxy intercepts --> sends onto destination SSL Server --> response --> proxy intercepts and reads --> client – themaestro Jun 25 '10 at 14:54
  • @MattH, the example clearly shows an application layer proxy and not a reverse one. You can call this a transparent proxy or not, depending on how the OP uses it. – Marcus Adams Jun 25 '10 at 14:54
  • sure the example shows an application layer HTTP proxy, but as we both seem to know that has little to do with proxying HTTPS. – MattH Jun 25 '10 at 15:42
  • Then might I ask how to proxy using HTTPS? – themaestro Jun 25 '10 at 17:18
  • @MarcusAdams, Your answer is quite unclear... might want to add an example. – Pacerier Oct 25 '17 at 10:37