0

I'm trying to download .pdfs using links from a xlsx file with urlretrieve(), one column has the links and the other has the names that the downloaded file should have.

The issue is that my code seems to just overwrite the same file over and over again as it downloads each item of the list.

from urllib.request import urlretrieve
from urllib.error import URLError, HTTPError
import os
import xlrd

workbook = xlrd.open_workbook('file.xlsx',on_demand=True)
sheet = workbook.sheet_by_name('Sheet1')
listofvalues = sheet.col_values(21, 1)
listofnames = sheet.col_values(2, 1)

for name in listofnames:
    for value in listofvalues:
        try:
            results = 'C:\\results'
            full_file_name = os.path.join(results, str(name + ".pdf"))
            urlretrieve(value, full_file_name)
            print(str(value) + ' DOWNLOADED')
        except (HTTPError, ValueError, URLError) as e:
            print("------------------------------------")
            print(e)
            print(value)
            print("-----------------------------------")

    continue

I think it has something to do with nested loops, but I can't find a solution.

  • 1
    Ignoring context, it's clear here, that ```name``` (and ```results```) is a-priori set / fixed and then an inner-loop runs and downloads to the same place ```len(listofvalues)-times```. – sascha May 21 '18 at 18:41
  • 1
    as mentioned above, you have to have `value` to be part of the filename. otherwise you keep writing to the same filename in inner loop – Evgeny May 21 '18 at 18:49

1 Answers1

0

As mentioned in a comment, your problem is that for each iteration of your outer loop, you inner loop will run to completion, so you will be naming all your files the same. You want to iterate through both lists at the same time so you can get the URL and the name for a given file.

For reference on how to do this, you can refer to: How to iterate through two lists in parallel?

EDIT:

The above response assumes both lists are the same length, i.e. you have one URL per file. If that is not the case, then refer to the comments left on your question.

Community
  • 1
  • 1