2

I am trying to download a pdf from a webpage using urllib. I used the source link that downloads the file in the browser but that same link fails to download the file in Python. Instead what downloads is a redirect to the main page.

import os
import urllib
os.chdir(r'/Users/file')
url = "http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414"
urllib.urlretrieve (url, "downloaded_file")

Please try downloading the file manually from the link provided or from the redirected site, the link on the main page is called 'sectionals'. Your help is much appreciated.

2 Answers2

1

It is because the given link redirects you to a "raw" pdf file. Examining the response headers via Firebug, I am able to get the filename sectionals/2014/2607RAND.pdf (see screenshot below) and as it is relative to the current .aspx file, the required URI should be switched to (in your case by changing the url variable to this link) http://www.australianturfclub.com.au/races/sectionals/2014/2607RAND.pdf

Firebug output

Apoorv
  • 373
  • 1
  • 5
  • 15
  • Nice find, but is there a way to retrieve this filename using python. I have already tried with the `urllib.info().headers` but nothing shows up with the current `meetingId=2414`. – user3818749 Aug 24 '14 at 11:27
0

In python3:

import urllib.request
import shutil
local_filename, headers = urllib.request.urlretrieve('http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414')
shutil.move(local_filename, 'ret.pdf')

The shutil is there because python save to a temp folder (im my case, that's another partition so os.rename will give me an error).

cox
  • 731
  • 5
  • 12