1

I am trying to download multiple zipped files from a website. I have looked at the answers for downloading one file and it seems pretty straight forward, but I am having trouble making it work for multiple files. The url has over 140 zipped files that I would like to download.

So far my code thoughts are:

import urllib
url = "http://ftp.geogratis.gc.ca/pub/nrcan_rncan/vector/geobase_nhn_rhn/shp_en/03/"
##search url for zipped files and download them (this is where I am stuck)
urlfile = (the zipped files??)
if urlfile.endswith (".zip"):
   urllib.urlretrieve (url, r"C:\Users\maverick\Desktop\Canada Features")

I know its not even close to what I need, but a push in the right direction would be appreciated. I have also looked at Scrapy, but I thought that urlib should be able to accomplish the task.

  • 2
    Can you connect via [ftp](https://stackoverflow.com/questions/111954/using-pythons-ftplib-to-get-a-directory-listing-portably) to transfer your files?. If you're fixated on parsing the webpage, then [Beautiful Soup](https://stackoverflow.com/questions/tagged/beautifulsoup?sort=votes&pageSize=15) might be useful for you. – import random May 29 '17 at 22:36

1 Answers1

2

As @Eric notes, this server is basically running an html alternative interface for an ftp server. You can use the ftp interface directly like:

from ftplib import FTP
import os

FTP_HOST = "ftp.geogratis.gc.ca"
FTP_DIR  = "pub/nrcan_rncan/vector/geobase_nhn_rhn/shp_en/03/"
OUT_DIR  = "/my/documents"    # <-- point this to an appropriate location!

# connect to host
ftp = FTP(FTP_HOST)
ftp.login()

# get list of .zip files
ftp.cwd(FTP_DIR)
files = ftp.nlst()
files = [f for f in files if f.lower().endswith(".zip")]

# download files
num = len(files)
for i, fname in enumerate(files, 1):
    print("Downloading {} ({}/{}) ... ".format(fname, i, num), end='')
    local_file = os.path.join(OUT_DIR, fname)
    with open(local_file, "wb") as outf:
        ftp.retrbinary("RETR "+fname, outf.write)
    print("done!")

ftp.close()

Be aware, this could take a while; the directory contains 9.3 GB of files.

Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
  • Thank you Eric and Hugh. I hadn't thought about it in that way. I am very new to Python, I use it mostly in Arcpy for ArcMap (very basic geoprocessing). I will try this script out and see how it goes. Also, much appreciate the comments in the script so I can understand what is happening. – travel_pixie May 31 '17 at 13:43