4

For our webservice, I wrote some logic to prevent multipart/form-data POSTs larger than, say, 4mb.

It boils down to the following (I've stripped away all WebOb usage and just reduced it to plain vanilla WSGI code):

import paste.httpserver

form = """\
<html>
<body>
  <form method="post" enctype="multipart/form-data" action="/">
    <input type="file" name="photopicker" />
    <input type="submit" />
  </form>
</body>
</html>
"""

limit = 4 * 1024 * 1024

def upload_app(environ, start_response):
    if environ['REQUEST_METHOD'] == 'POST':
        if int(environ.get('CONTENT_LENGTH', '0')) > limit:
            start_response('400 Ouch', [('content-type', 'text/plain')])
            return ["Upload is too big!"]
    # elided: consume the file appropriately
    start_response('200 OK', [('content-type', 'text/html')])
    return [form]

paste.httpserver.serve(upload_app, port=7007)

The logic shown works right when unit tested. But as soon as I tried sending actual files larger than 4mb to this endpoint, I got errors like these on the client side:

  • Error 101 (net::ERR_CONNECTION_RESET): Unknown error. from Google Chrome
  • The connection to the server was reset while the page was loading. from Firefox

Same error occurs when using Python built-in wsgiref HTTP server.

Fact: once I added environ['wsgi.input'].read() just before responding with HTTP 400, the connection reset problem went away. Of course, this is not a good fix. It just shows what happens when you fully consume the input.

I perused HTTP: The Definitive Guide and found some interesting guidelines on how it was important to mangage TCP connections carefully when implementing HTTP servers and clients. It went on about how, instead of close-ing socket, it was preferred to do shutdown, so that the client had chance to react and stop sending more data to server.

Perhaps I am missing some crucial implementation detail that prevents such connection resets. Insights anyone?

See the gist.

Pavel Repin
  • 30,663
  • 1
  • 34
  • 41
  • I did some more digging online. Turns out limiting body size gracefully is poorly addressed. Nginx author Igor Sysoev has a write up about this: http://translate.google.com/translate?hl=en&sl=ru&tl=en&u=http://sysoev.ru/web/upload.html To summarize: you have to perform a "lingering close" dance, where server sends the error response, shuts down the socket for writing, and waits for client for a while in order to avoid just killing connection out right. – Pavel Repin Jan 20 '10 at 22:22
  • Another pertinent link: http://tools.ietf.org/html/draft-ietf-http-connection-00#section-8 It advises that HTTP server should close "half of the connection". – Pavel Repin Jan 22 '10 at 04:29

1 Answers1

2

This is happening because you are discarding the input stream without reading it, and this is forcing it closed. The browser has queued up a good portion of the file to be sent already and then it gets a write error because the server closes the connection forcefully.

There is no way around this that I know of without reading all the input.

I would recommend some Javascript to test the size of the file before it is sent. Then the only people who get the error are those who are ignoring the client-side check because they don't have Javascript or because they are purposefully trying to be malicious.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • 1
    Unfortunately, without ActiveX, Java or Flash on client side, you can't interrogate the input element on the form even when you look for innocent things like size of selected file. – Pavel Repin Jan 20 '10 at 22:16
  • @Pavel Repin, I didn't realize that about Javascript. That's unfortunate. I will point out though that my answer is still basically correct, even if it's not a happy answer. :-) – Omnifarious Jan 21 '10 at 00:51