1

I'm running a python program to download a selected list of CSV files from canada.ca. I have all the urls I need but I don't know how to download them to my local directory. I believe that I have to use a request, and write the files in a loop. But i'm kind lost on how to do it, thanks in advance.

en_urls = []
for link in soup.find_all('a'):
    if 'EN.csv' in link.get('href', []):
        en_urls.append(link.get('href'))


   Output
['http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/Positive_Employers_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2015_Positive_Employers_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2016_Positive_Employer_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q1Q2_Positive_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q3_Positive_Employer_Stream_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q1_Positive_Employer_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q2_Positive_Employer_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q4_Positive_Employer_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q3_Positive_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q4_Positive_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q1_employer_positive_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q2_employer_positive_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q3_Positive_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q4_Positive_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2020Q1_Positive_EN.csv']
Hugo Lira
  • 31
  • 4
  • I don't have the ability to mark this question as a duplicate, but you'll find a variety of answers to your question of downloading files via python here: https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python – Daniel Skovli Jul 21 '20 at 21:41

2 Answers2

1

You can use urllib.request.urlretrieve() in a loop.

For example:

import urllib.request

lst = ['http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/Positive_Employers_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2015_Positive_Employers_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2016_Positive_Employer_EN.csv',
 'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q1Q2_Positive_EN.csv']

for i in lst:
    print('Downloading {}..'.format(i))
    local_filename, _ = urllib.request.urlretrieve(i, filename=i.split('/')[-1])
    print('File saved as {}'.format(local_filename))

Prints:

Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/Positive_Employers_EN.csv..
File saved as Positive_Employers_EN.csv
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2015_Positive_Employers_EN.csv..
File saved as 2015_Positive_Employers_EN.csv
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2016_Positive_Employer_EN.csv..
File saved as 2016_Positive_Employer_EN.csv
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q1Q2_Positive_EN.csv..
File saved as 2017Q1Q2_Positive_EN.csv
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

Try this:

en_urls = []
for link in soup.find_all('a'):
    if 'EN.csv' in link.get('href', []):
        en_urls.append(link.get('href'))
for link in en_urls:
    with open(f'{link.split("/")[-1]}', 'wb') as file:
        r = requests.get(link, stream=True)
        if r.ok:
            for block in r.iter_content(2*1024**2):
                file.write(block)
        else:
            print(f'Download faild on {link} with {r}')
UWTD TV
  • 910
  • 1
  • 5
  • 11