3

I am trying to build a reverse-proxy to talk to certain APIs(like Twitter, Github, Instagram) that I can then call with my reverse-proxy to any (client) applications I want (think of it like an API-manager).

Also, I am using an LXC-container to do this.

For example, here is the simplest of code that I hacked from the examples on the Twisted Docs:

from twisted.internet import reactor
from twisted.web import proxy, server
from twisted.python.log import startLogging
from sys import stdout
startLogging(stdout)

site = server.Site(proxy.ReverseProxyResource('https://api.github.com/users/defunkt', 443, b''))
reactor.listenTCP(8080, site)
reactor.run()

When I do CURL within the container, I get a valid request (meaning I get the appropriate JSON response).

Here is how I used the CURL command:

curl https://api.github.com/users/defunkt

And here is the output I get:

{
  "login": "defunkt",
  "id": 2,
  "avatar_url": "https://avatars.githubusercontent.com/u/2?v=3",
  "gravatar_id": "",
  "url": "https://api.github.com/users/defunkt",
  "html_url": "https://github.com/defunkt",
  "followers_url": "https://api.github.com/users/defunkt/followers",
  "following_url": "https://api.github.com/users/defunkt/following{/other_user}",
  "gists_url": "https://api.github.com/users/defunkt/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/defunkt/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/defunkt/subscriptions",
  "organizations_url": "https://api.github.com/users/defunkt/orgs",
  "repos_url": "https://api.github.com/users/defunkt/repos",
  "events_url": "https://api.github.com/users/defunkt/events{/privacy}",
  "received_events_url": "https://api.github.com/users/defunkt/received_events",
  "type": "User",
  "site_admin": true,
  "name": "Chris Wanstrath",
  "company": "GitHub",
  "blog": "http://chriswanstrath.com/",
  "location": "San Francisco",
  "email": "chris@github.com",
  "hireable": true,
  "bio": null,
  "public_repos": 107,
  "public_gists": 280,
  "followers": 15153,
  "following": 208,
  "created_at": "2007-10-20T05:24:19Z",
  "updated_at": "2016-02-26T22:34:27Z"
}

However, when I attempt fetching the proxy via Firefox using:

http://10.5.5.225:8080/

I get: "Could not connect"

This is what my Twisted log looks like:

2016-02-27 [-] Log opened.

2016-02-27 [-] Site starting on 8080

2016-02-27 [-] Starting factory

2016-02-27 [-] Starting factory

2016-02-27 [-] "10.5.5.225" - - [27/Feb/2016: +0000] "GET / HTTP/1.1" 501 26 "-" "Mozilla/5.0 (X11; Debian; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0"

2016-02-27 [-] Stopping factory

How can I use Twisted to make an API call (most APIs are HTTPS nowadays anyway) and get the required response (basically, what the "200" response/JSON should be)?

I tried looking at this question: Convert HTTP Proxy to HTTPS Proxy in Twisted

But it didn't make much sense from a coding point-of-view (or mention anything about reverse-proxying).

**Edit: I also tried switching out the HTTPS API call for a regular HTTP call using:

curl http[colon][slash][slash]openlibrary[dot]org[slash]authors[slash]OL1A.json

(URL above has been formatted to avoid link-conflict issue)

However, I still get the same error in my browser (as mentioned above).

**Edit2: I have tried running your code, but I get this error:

Error-screenshot

If you look at the image, you will see the error (when running the code) of:

builtins.AttributeError: 'str' object has no attribute 'decode'

Community
  • 1
  • 1
coolpy
  • 163
  • 1
  • 2
  • 14
  • When I run this example, `curl` and Firefox both say "could not connect", so I'm not sure what you're doing to get the correct JSON response. Are you running the code sample exactly as written? – Glyph Feb 27 '16 at 06:58
  • wow, you are the founder of Twisted, nice to meet you sir! I am running the code with: `python3 file.py`. As far as the output goes, I am editing my question to put in how I used curl and the output I got. It may be that you are being rate-limited by the github API (there is some limit on public calls without an API key), but I successfully managed to get the JSON response. – coolpy Feb 27 '16 at 10:43
  • Pleasure to meet you as well. Thanks for using Twisted :). Now that you've made it clear how you're running your command I can answer it... – Glyph Feb 29 '16 at 04:13

1 Answers1

7

If you read the API documentation for ReverseProxyResource, you will see that the signature of __init__ is:

def __init__(self, host, port, path, reactor=reactor):

and "host" is documented as "the host of the web server to proxy".

So you are passing a URI where Twisted expects a host.

Worse yet, ReverseProxyResource is designed for local use on a web server, and doesn't quite support https:// URLs out of the box.

It does have a (very limited) extensibility hook though - proxyClientFactoryClass - and to apologize for ReverseProxyResource not having what you need out of the box, I will show you how to use that to extend ReverseProxyResource to add https:// support so you can use the GitHub API :).

from twisted.web import proxy, server
from twisted.logger import globalLogBeginner, textFileLogObserver
from twisted.protocols.tls import TLSMemoryBIOFactory
from twisted.internet import ssl, defer, task, endpoints
from sys import stdout
globalLogBeginner.beginLoggingTo([textFileLogObserver(stdout)])

class HTTPSReverseProxyResource(proxy.ReverseProxyResource, object):
    def proxyClientFactoryClass(self, *args, **kwargs):
        """
        Make all connections using HTTPS.
        """
        return TLSMemoryBIOFactory(
            ssl.optionsForClientTLS(self.host.decode("ascii")), True,
            super(HTTPSReverseProxyResource, self)
            .proxyClientFactoryClass(*args, **kwargs))
    def getChild(self, path, request):
        """
        Ensure that implementation of C{proxyClientFactoryClass} is honored
        down the resource chain.
        """
        child = super(HTTPSReverseProxyResource, self).getChild(path, request)
        return HTTPSReverseProxyResource(child.host, child.port, child.path,
                                         child.reactor)

@task.react
def main(reactor):
    import sys
    forever = defer.Deferred()
    myProxy = HTTPSReverseProxyResource('api.github.com', 443,
                                        b'/users/defunkt')
    myProxy.putChild("", myProxy)
    site = server.Site(myProxy)
    endpoint = endpoints.serverFromString(
        reactor,
        dict(enumerate(sys.argv)).get(1, "tcp:8080:interface=127.0.0.1")
    )
    endpoint.listen(site)
    return forever

If you run this, curl http://localhost:8080/ should do what you expect.

I've taken the liberty of modernizing your Twisted code somewhat; endpoints instead of listenTCP, logger instead of twisted.python.log, and react instead of starting the reactor yourself.

The weird little putChild piece at the end there is because when we pass b"/users/defunkt" as the path, that means a request for / will result in the client requesting /users/defunkt/ (note the trailing slash), which is a 404 in GitHub's API. If we explicitly proxy the empty-child-segment path as if it did not have the trailing segment, I believe it will do what you expect.

PLEASE NOTE: proxying from plain-text HTTP to encrypted HTTPS can be extremely dangerous, so I've added a default listening interface here of localhost-only. If your bytes transit over an actual network, you should ensure that they are properly encrypted with TLS.

Glyph
  • 31,152
  • 11
  • 87
  • 129
  • Thank you for the detailed answer Sir. I would firstly like to say that there is no need to apologize for the software not being able to work out of the box, it is not anybodies fault that my explicit use-case wasn't addressed :) I am successfully able to run the code, but I get an error that I will paste as an edit/image in my input above. I have tried to address it, as I think it is a byte/string issue with python3, but no luck so far. – coolpy Mar 11 '16 at 05:02
  • I can't upvote your answer due to my low score, but I marked it as the chosen answer. – coolpy Jul 01 '16 at 14:03