1

A web server responds to a POST request with a file to download (has Content-Disposition header). Using urllib or mechanize opener at what point will the response body be downloaded?

opener = mechanize.build_opener(HTTPRefererProcessor, HTTPEquivProcessor, HTTPRefreshProcessor)
r = make_post_request() # makes Request object to send
res = opener.open(r)
info = response.info()
content_disp = info.getheader('content-disposition')
filename = content_disp.split('=')[1]
content = res.read() # or skip based on filename

I was under the impression that the body won't download until read(), which would be useful for skipping certain download (such as files already downloaded) but I am not seeing great deal of performance improvement.

Victor Olex
  • 1,458
  • 1
  • 13
  • 28
  • use a traffic analyzer like wireshark...what do you see being sent over the connection? – AJ. Jun 01 '11 at 19:44
  • WireShark might tell you *how much* of the file is being sent, but the webserver will begin transmitting the file regardless of whether you've called read(). Though whatever buffers exist might fill up and the transfer might stop if you haven't called read() yet. – Ken Kinder Jun 01 '11 at 19:50

2 Answers2

3

HTTP is a connection-less protocol, meaning that there is no channel established, in which a server could write data in several steps. So If a POST or a GET request is send to a sever, it MUST responds with a complete response, as it can't know, if itwas the 1st or 2nd request. Cookies, AJAX, Comet helps to emulate something like a channel, but there isn't one. Thats why there is the HEAD request: With this the browser can determine, if a resource must be loaded or not.

vikingosegundo
  • 52,040
  • 14
  • 137
  • 178
1

Well, when you just want headers, you should be using HTTP HEAD. POST and GET will by definition return content.

In terms of stopping the download, the web server won't wait to start sending you data, and everything from Python to your network card will start receiving and buffering the data immediately.

So your best bet is to find a better way of doing this -- HTTP HEAD for example. If that's not an option, call close() on your request object immediately after getting whatever headers you need and hope you didn't waste too much bandwidth.

(And for an example on using HTTP HEAD in Python, see this answer from a while ago.)

Community
  • 1
  • 1
Ken Kinder
  • 12,654
  • 6
  • 50
  • 70
  • Closing the _response_ is precisely what I have done (not seen in the snippet). I have seen the HEAD questions but that would only work if download is effected by redirect to GET. Some servers will include content in response to POST directly so HEAD is not an option (afaik). – Victor Olex Jun 01 '11 at 19:55
  • Closing the request is your best option, but you should be aware that you are probably wasting quite a bit of bandwidth. If you're lucky, you'll prevent the server from sending the *whole* file, but that isn't guaranteed. What you're asking for is impossible. – Ken Kinder Jun 01 '11 at 19:57