I'm trying to use the python requests library to download a file from this link: http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download
Clicking on this link will give you a file (nasdaq.csv) only when using a browser. I used the Firefox Network Monitor Ctrl-Shift-Q to retrieve all the headers that Firefox sends. So now I finally get a 200 server response but still no file. The file that this script produces contains parts of the Nasdaq website, not the csv data. I looked at similar questions on this site and nothing leads me to believe that this shouldn't be possible with the requests library.
Code:
import requests
url = "http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download"
# Fake Firefox headers
headers = {"Host" : "www.nasdaq.com", \
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0", \
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", \
"Accept-Language": "en-US,en;q=0.5", \
"Accept-Encoding": "gzip, deflate", \
"DNT": "1", \
"Cookie": "clientPrefs=||||lightg; userSymbolList=EOD+&DIT; userCookiePref=true; selectedsymbolindustry=EOD,; selectedsymboltype=EOD,EVERGREEN GLOBAL DIVIDEND OPPORTUNITY FUND COMMON SHARES OF BENEFICIAL INTEREST,NYSE; c_enabled$=true", \
"Connection": "keep-alive", }
# Get the list
response = requests.get(url, headers, stream=True)
print(response.status_code)
# Write server response to file
with open("nasdaq.csv", 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)