I'm looking to write a script that can automatically download .zip
files from the Bureau of Transportation Statistics Carrier Website, but I'm having trouble getting the same response headers as I can see in Chrome when I download the zip file. I'm looking to get a response header that looks like this:
HTTP/1.1 302 Object moved
Cache-Control: private
Content-Length: 183
Content-Type: text/html
Location: http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Date: Thu, 21 Apr 2016 15:56:31 GMT
However, when calling requests.post(url, data=params, headers=headers)
with the same information that I can see in the Chrome network inspector I am getting the following response:
>>> res.headers
{'Cache-Control': 'private', 'Content-Length': '262', 'Content-Type': 'text/html', 'X-Powered-By': 'ASP.NET', 'Date': 'Thu, 21 Apr 2016 20:16:26 GMT', 'Server': 'Microsoft-IIS/8.5'}
It's got pretty much everything except it's missing the Location
key that I need in order to download the .zip
file with all of the data I want. Also the Content-Length
value is different, but I'm not sure if that's an issue.
I think that my issue has something to do with the fact that when you click "Download" on the page it actually sends two requests that I can see in the Chrome network console. The first request is a POST
request that yields an HTTP
response of 302 and then has the Location
in the response header. The second request is a GET
request to the url specified in the Location
value of the response header.
Should I really be sending two requests here? Why am I not getting the same response headers using requests
as I do in the browser? FWIW I used curl -X POST -d /*my data*/
and got back this in my terminal:
<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found <a HREF="http://tsdata.bts.gov/103714760_T_T100_SEGMENT_ALL_CARRIER.zip">here</a>.</body>
Really appreciate any help!