77

I want to proxy requests made to my Flask app to another web service running locally on the machine. I'd rather use Flask for this than our higher-level nginx instance so that we can reuse our existing authentication system built into our app. The more we can keep this "single sign on" the better.

Is there an existing module or other code to do this? Trying to bridge the Flask app through to something like httplib or urllib is proving to be a pain.

Joe Shaw
  • 22,066
  • 16
  • 70
  • 92
  • Also this question is relevant when doing AJAX services for old browsers like IE7 which do not support cross-domain security. – Mikko Ohtamaa Jul 11 '11 at 22:06
  • What specific problem are you having with httplib? – jd. Jul 12 '11 at 05:34
  • @jd: Given that flask is on the app side of WSGI, I am not sure I get all of the data to effectively forward. For example, the Flask request object doesn't seem to include the raw request (or even the request headers) that I'd want to pass into httplib. It's not that it's impossible, it's just a pain and I was hoping for an existing module which did it already. – Joe Shaw Jul 12 '11 at 13:58

3 Answers3

132

I spent a good deal of time working on this same thing and eventually found a solution using the requests library that seems to work well. It even handles setting multiple cookies in one response, which took a bit of investigation to figure out. Here's the flask view function:

from dotenv import load_dotenv  # pip package python-dotenv
import os
#
from flask import request, Response
import requests  # pip package requests


load_dotenv()
API_HOST = os.environ.get('API_HOST'); assert API_HOST, 'Envvar API_HOST is required'

@api.route('/', defaults={'path': ''})  # ref. https://medium.com/@zwork101/making-a-flask-proxy-server-online-in-10-lines-of-code-44b8721bca6
@api.route('/<path>')
def redirect_to_API_HOST(path):  #NOTE var :path will be unused as all path we need will be read from :request ie from flask import request
    res = requests.request(  # ref. https://stackoverflow.com/a/36601467/248616
        method          = request.method,
        url             = request.url.replace(request.host_url, f'{API_HOST}/'),
        headers         = {k:v for k,v in request.headers if k.lower() != 'host'}, # exclude 'host' header
        data            = request.get_data(),
        cookies         = request.cookies,
        allow_redirects = False,
    )

    #region exlcude some keys in :res response
    excluded_headers = ['content-encoding', 'content-length', 'transfer-encoding', 'connection']  #NOTE we here exclude all "hop-by-hop headers" defined by RFC 2616 section 13.5.1 ref. https://www.rfc-editor.org/rfc/rfc2616#section-13.5.1
    headers          = [
        (k,v) for k,v in res.raw.headers.items()
        if k.lower() not in excluded_headers
    ]
    #endregion exlcude some keys in :res response

    response = Response(res.content, res.status_code, headers)
    return response

Update April 2021: excluded_headers should probably include all "hop-by-hop headers" defined by RFC 2616 section 13.5.1.

arychj
  • 711
  • 3
  • 21
Evan
  • 2,282
  • 2
  • 19
  • 21
  • 1
    Edited just now to remove 'Host' from the request headers and remove a few items from the response headers. – Evan Jun 29 '16 at 22:06
  • 1
    `headers` seems to be unused? – luckydonald Aug 28 '16 at 04:17
  • 1
    @luckydonald I think it is fixed now. Thanks for pointing that out. – Evan Aug 29 '16 at 13:19
  • That header loop is brilliant – M Leonard Dec 13 '17 at 20:27
  • @MLeonard tho it ignores the fact that `werkzeug.Headers` may store multiple values for each key. Then the header will be overriden twice or even more. – ddnomad Mar 06 '18 at 14:20
  • This bit saved my day, better, it saved my week! Fantastic! Now I only need to understand @ddnomad comment, which headers have multiple values? And how to fix that? – Kuzeko Jul 12 '18 at 15:32
  • @Evan do we need to add a new HOST header with proxy name since the original HOST header has been removed? – Caxton Oct 03 '18 at 03:20
  • @Caxton `requests.request()` sets the HOST header for you, I believe. – Evan Oct 03 '18 at 19:54
  • 2
    @Evan nice solution. It doesn't handle 3xx redirections, however, since the redirection url might point to the proxied host – Ire Dec 18 '18 at 16:46
  • 2
    Could someone add how you call this in a MWE app? – user1717828 Dec 20 '18 at 01:28
  • 5
    This is great, many thanks! (This is what one needs to get ngrok to work with both front and back ends.) But, for me `request.host_url` includes `http://` and also a trailing slash so the replace line for me was: `request.url.replace(request.host_url, 'http://new-domain.com/')` – jbasko Feb 08 '19 at 14:11
  • 2
    @Ire I encountered this problem and have added an edit to fix. All I did was replaced the header filter line with `headers = [(name, value) if (name.lower() != 'location') else (name, value.replace('http://new-domain.com/', request.host_url)) for (name, value) in resp.raw.headers.items() if name.lower() not in excluded_headers]`. This just fixes the URL in the Location header. (thanks @jbasko for pointing out the issue with the trailing slash) – Eric Reed Feb 15 '19 at 03:29
  • 6
    You can also stream the response content, instead of reading it entirely on the server. For this replace `resp.content` above with `resp.iter_content(chunk_size=10*1024)` and add `content_type=r.headers['Content-Type']` argument to `Response` constructor. – Tim Mar 06 '19 at 20:31
  • 1
    I decorated this method with the following 3 attributes to ensure it handles requests with any paths (at least 3 levels deep, add more to handle deeper). Not sure if there is a better way... @app.route('/api/v1/', methods=['GET', 'POST', 'DELETE', 'PUT', 'PATCH']) @app.route('/api/v1//', methods=['GET', 'POST', 'DELETE', 'PUT', 'PATCH']) @app.route('/api/v1///', methods=['GET', 'POST', 'DELETE', 'PUT', 'PATCH']) – Mr. Bungle Oct 20 '19 at 23:41
  • @Evan why do we need to remove these headers from response?`excluded_headers = ['content-encoding', 'content-length', 'transfer-encoding', 'connection']` – Vivek Kumar Nov 29 '21 at 12:55
  • Why do we only send the "HOST" header to the server behind the reverse proxy? I see an edit to this answer that reversed the logic from including everything BUT the "HOST" header to ONLY including the "HOST" header. Shouldn't the code be `!=`? – John Carrell Mar 09 '23 at 00:02
  • I seem to have an issue when uploading file in form data: ```requests\models.py", line 149, in _encode_files raise ValueError("Data must not be a string.") ValueError: Data must not be a string.``` – Ian A McElhenny Apr 06 '23 at 16:05
13

I have an implementation of a proxy using httplib in a Werkzeug-based app (as in your case, I needed to use the webapp's authentication and authorization).

Although the Flask docs don't state how to access the HTTP headers, you can use request.headers (see Werkzeug documentation). If you don't need to modify the response, and the headers used by the proxied app are predictable, proxying is staightforward.

Note that if you don't need to modify the response, you should use the werkzeug.wsgi.wrap_file to wrap httplib's response stream. That allows passing of the open OS-level file descriptor to the HTTP server for optimal performance.

jd.
  • 10,678
  • 3
  • 46
  • 55
  • Thanks, I hacked something up this afternoon. Having all sorts of problems with cookies, though, since httplib doesn't handle them particularly well. Unfortunately I think I will need to modify the response to do some simple URL rewriting (ie, to – Joe Shaw Jul 12 '11 at 19:57
  • In my case there was just one cookie to catch, so a regex did the job to parse it, it's a lot easier to setup that Python's cookie libs. – jd. Jul 12 '11 at 20:57
  • 1
    Could you provide a link to your implementation, or the code itself in the body of the answer? – Carlos Pinzón Apr 26 '19 at 14:21
  • Here is a [related SO answer](https://stackoverflow.com/a/50231825/399573) with actual implementation. – gt6989b Aug 17 '20 at 12:26
9

My original plan was for the public-facing URL to be something like http://www.example.com/admin/myapp proxying to http://myapp.internal.example.com/. Down that path leads madness.

Most webapps, particularly self-hosted ones, assume that they're going to be running at the root of a HTTP server and do things like reference other files by absolute path. To work around this, you have to rewrite URLs all over the place: Location headers and HTML, JavaScript, and CSS files.

I did write a Flask proxy blueprint which did this, and while it worked well enough for the one webapp I really wanted to proxy, it was not sustainable. It was a big mess of regular expressions.

In the end, I set up a new virtual host in nginx and used its own proxying. Since both were at the root of the host, URL rewriting was mostly unnecessary. (And what little was necessary, nginx's proxy module handled.) The webapp being proxied to does its own authentication which is good enough for now.

Joe Shaw
  • 22,066
  • 16
  • 70
  • 92
  • 3
    Some illustration to "I set up a new virtual host" would be nice. – PascalVKooten Mar 26 '18 at 08:35
  • 1
    Definitely your last paragraph. Flask's strength is not as a proxy, so when possible, it's preferable to avoid using as one. The only valid reason I can think of is that some application logic like authentication or authorization is necessary and not supported by the other application. – jpmc26 Jan 03 '19 at 22:32