32

Two part question. I am trying to download multiple archived Cory Doctorow podcasts from the internet archive. The old one's that do not come into my iTunes feed. I have written the script but the downloaded files are not properly formatted.

Q1 - What do I change to download the zip mp3 files? Q2 - What is a better way to pass the variables into URL?

 # and the base url.

def dlfile(file_name,file_mode,base_url):
    from urllib2 import Request, urlopen, URLError, HTTPError

    #create the url and the request
    url = base_url + file_name + mid_url + file_name + end_url 
    req = Request(url)

    # Open the url
    try:
        f = urlopen(req)
        print "downloading " + url

        # Open our local file for writing
        local_file = open(file_name, "wb" + file_mode)
        #Write to our local file
        local_file.write(f.read())
        local_file.close()

    #handle errors
    except HTTPError, e:
        print "HTTP Error:",e.code , url
    except URLError, e:
        print "URL Error:",e.reason , url

# Set the range 
var_range = range(150,153)

# Iterate over image ranges
for index in var_range:

    base_url = 'http://www.archive.org/download/Cory_Doctorow_Podcast_'
    mid_url = '/Cory_Doctorow_Podcast_'
    end_url = '_64kb_mp3.zip'
    #create file name based on known pattern
    file_name =  str(index) 
    dlfile(file_name,"wb",base_url

This script was adapted from here

Justjoe
  • 371
  • 1
  • 4
  • 10

2 Answers2

54

Here's how I'd deal with the url building and downloading. I'm making sure to name the file as the basename of the url (the last bit after the trailing slash) and I'm also using the with clause for opening the file to write to. This uses a ContextManager which is nice because it will close that file when the block exits. In addition, I use a template to build the string for the url. urlopen doesn't need a request object, just a string.

import os
from urllib2 import urlopen, URLError, HTTPError


def dlfile(url):
    # Open the url
    try:
        f = urlopen(url)
        print "downloading " + url

        # Open our local file for writing
        with open(os.path.basename(url), "wb") as local_file:
            local_file.write(f.read())

    #handle errors
    except HTTPError, e:
        print "HTTP Error:", e.code, url
    except URLError, e:
        print "URL Error:", e.reason, url


def main():
    # Iterate over image ranges
    for index in range(150, 151):
        url = ("http://www.archive.org/download/"
               "Cory_Doctorow_Podcast_%d/"
               "Cory_Doctorow_Podcast_%d_64kb_mp3.zip" %
               (index, index))
        dlfile(url)

if __name__ == '__main__':
    main()
dcolish
  • 22,727
  • 1
  • 24
  • 25
  • Just one last question where can I learn more about the "%d" used in the URL, i'm still a bit sketchy on how it's working. – Justjoe Oct 27 '10 at 00:32
  • Check out the docs for string formatting: http://docs.python.org/library/stdtypes.html#string-formatting – dcolish Oct 27 '10 at 00:35
  • One of the downside of this method is that the `f.read()` will store the entire file in memory before writing it down to the disk, which could fail if the file is too big. – Maxime May 17 '18 at 02:17
1

An older solution on SO along the lines of what you want:

Community
  • 1
  • 1
pyfunc
  • 65,343
  • 15
  • 148
  • 136