3

Here's a Flask app that can be run either from the command-line or via Apache/WSGI:

import flask
app = flask.Flask(__name__)

LENGTH = 1000000                # one million

@app.route('/', methods=['HEAD'])
def head():
    return 'x' * LENGTH         # response body isn't actually sent

@app.route('/', methods=['GET'])
def get():
    import random
    return ''.join(str(random.randint(0,9)) for x in range(LENGTH))

if __name__ == '__main__':
    app.run()                   # from command-line
else:
    application = app           # via Apache and WSGI

I.e., this app returns a million random digits. The GET request takes a non-trivial amount of time, but a HEAD request should be able to return almost immediately. This is of course an illustrative example; the real application would involve large responses that are slow to generate for a GET request, but that also have pre-determined size that could be quickly queried by a HEAD request. (Another scenario: I'm trying to redirect requests to pre-signed Amazon S3 URLs, which must be signed differently for HEAD and GET methods.)

Question #1) When I run the Flask app from the command-line, a HEAD request activates the head function, as expected; but when I run this via Apache/WSGI, it activates the get function. Why is this, and how can I work around it to get my desired behavior?

Question #2) instead of creating a dummy response (allocating a bunch of memory) for the HEAD request, why can't I return app.make_response('', 200, {'Content-Length':LENGTH})?

My guess is that these are caused by a well-intentioned attempt to ensure that a HEAD request should always be consistent with the corresponding GET. So:

Guess #1) Either Apache or WSGI is internally rewriting HEAD to GET.

Guess #2) Flask doesn't trust me to set the Content-Length header manually, and rewrites it with the actual length of the response body... even for a HEAD request where this is in fact supposed to be empty.

Am I misunderstanding something? Any advice on how I can enable faster handling of HEAD requests, ideally without having to slowly generate a large response body that is only used to set the Content-Length header?

2 Answers2

3

As already noted, the issue of why mod_wsgi remaps HEAD to GET is well described in:

In particular, as explained in that blog post, if you have an Apache output filter setup and there is the chance that it therefore requires to see the same output from your WSGI application for either a GET or HEAD against the same URL, then mod_wsgi will not trust that your application does the correct thing and will remap HEAD to GET to ensure that the Apache output filter will work properly.

If you don't care that you are not returning the same response headers for a HEAD request as is for a GET request, and thus breaking the requirement for HEAD specified by the HTTP RFC, then simply ensure that you have no Apache output filters configured and you can break things as much as you like as mod_wsgi will not then remap the request method type.

Graham Dumpleton
  • 57,726
  • 6
  • 119
  • 134
  • 1
    Graham, thank you so much for the helpful blog post and this answer. I looked into my Apache settings and found that disabling mod_filter and mod_deflate did not solve the problem ... but then I discovered that turning off SSL did. So I guess mod_wsgi also considers SSL encryption as an output filter and does not trust the application to match HEAD and GET... is that right, or is there some other way I might be able to get at the HEAD requests without disabling SSL? – A Real Live Operator Mar 17 '14 at 17:25
  • 1
    The mod_wsgi code will apply the remapping if it detects any registered content level Apache output filter. Thus any filter which could use or modify content. So it doesn't distinguish by name of filter. What I possibly should do is provide a configuration detective to allow the automatic remapping to be disabled if need be. – Graham Dumpleton Mar 17 '14 at 21:41
  • 1
    Yes, I think that would be a very useful configuration option, thanks! – A Real Live Operator Mar 17 '14 at 23:02
2

To create a complete response from Flask, you want to do something like this:

@app.route('/', methods=['HEAD'])
def head():
    response = Response()
    response.headers.add('content-length', LENGTH)
    return response

That will then result in something like this:

Connected to localhost.
Escape character is '^]'.
HEAD / HTTP/1.1
Host: localhost

HTTP/1.0 200 OK
Content-Type: text/html; charset=utf-8
content-length: 1000000
Server: Werkzeug/0.9.4 Python/2.7.6
Date: Sun, 16 Mar 2014 22:59:16 GMT

This was tested with just the standard runner and not going through wsgi, but it shouldn't make a difference.

As for Apache/WSGI forcing the usage of the get handler, this blog entry has some hints as to why this is happening.

See: Flask/Werkzeug how to attach HTTP content-length header to file download

Community
  • 1
  • 1
metatoaster
  • 17,419
  • 5
  • 55
  • 66
  • This is perfect, thank you for pointing me to resources that I probably should have looked up myself :-). The top-voted answer on the SO link you reference suggests that `make_response` is considered more "canonical" as opposed to using `Resource`... but I suppose in this case, it's the only suitable work-around, right? – A Real Live Operator Mar 17 '14 at 17:21
  • Maybe you might want to try creating the response with `make_response` and then set the header afterwards, since the constructor may have defined values there that override the provided one. – metatoaster Mar 18 '14 at 01:37