0

I am trying to create a list of URLs using a for loop. It prints all the correct URLs, but is not saving them in a list. Ultimately I want to download multiple files using urlretrieve.

for i, j in zip(range(0, 17), range(1, 18)):
    if i < 8 or j < 10:
        url = "https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"
        print(url)
    if i == 9 and j == 10:
        url = "https://Here is a URL/P200{}".format(i) + "-{}".format(j) + ".xls"
        print(url)
    if i > 9:
        if i > 9 or j < 8:
            url = "https://Here is a URL/P20{}".format(i) + "-{}".format(j) + ".xls"
            print(url)

Output of above code is:

https://Here is a URL/P2000-01.xls
https://Here is a URL/P2001-02.xls
https://Here is a URL/P2002-03.xls
https://Here is a URL/P2003-04.xls
https://Here is a URL/P2004-05.xls
https://Here is a URL/P2005-06.xls
https://Here is a URL/P2006-07.xls
https://Here is a URL/P2007-08.xls
https://Here is a URL/P2008-09.xls
https://Here is a URL/P2009-10.xls
https://Here is a URL/P2010-11.xls
https://Here is a URL/P2011-12.xls
https://Here is a URL/P2012-13.xls
https://Here is a URL/P2013-14.xls
https://Here is a URL/P2014-15.xls
https://Here is a URL/P2015-16.xls
https://Here is a URL/P2016-17.xls

But this:

url

gives only:

'https://Here is a URL/P2016-17.xls'

How do I get all the URLs, not just the final one?

CrazyChucky
  • 3,263
  • 4
  • 11
  • 25
  • What happens when you, say, try to save the URLs in a list? – anurag Jan 22 '21 at 10:21
  • you are replacing url variable everytime with new value. If you want to preserve values, you need to create a list and assign values to the list – sunilbaba Jan 22 '21 at 10:24
  • The Final url is overwriting all the urls – FinickyBee Jan 22 '21 at 10:24
  • @sunilbaba I have been trying to downlaod the data. Till now the website has data till 2016-17 but when the data for 2017-18 also comes the URL list will not be able to collect the the data automatically, Goal is to automate the script so as to reduce the manual effort – FinickyBee Jan 22 '21 at 10:28
  • If you reduce your range, you will restrict your URL formation till 2016-17 itself – sunilbaba Jan 22 '21 at 10:30

2 Answers2

1

There are several things that could significantly simplify your code. First of all, this:

"https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"

could be simplified to this:

"https://Here is a URL/P200{}-0{}.xls".format(i, j)

And if you have at least Python 3.6, you could use an f-string instead:

f"https://Here is a URL/P200{i}-0{j}.xls"

Second of all, Python has several ways to conveniently pad numbers with zeroes; it can even be done as part of the f-string formatting. Additionally, range starts from zero by default.

So your entire original code is equivalent to:

for num in range(17):
    print(f'https://Here is a URL/P20{num:02}-{num+1:02}.xls')

Now, you want to actually use these URLs, not just print them out. You mentioned building a list, which can be done like so:

urls = []
for num in range(17):
    urls.append(f'https://Here is a URL/P20{num:02}-{num+1:02}.xls')

Or with a list comprehension:

urls = [f'https://Here is a URL/P20{num:02}-{num+1:02}.xls'
        for num in range(17)]

Based on your comments here and on your other question, you seem to be confused about what form you need these URLs to be in. Strings like this are already what you need. urlretrieve accepts the URL as a string, so you don't need to do any further processing. See the example in the docs:

local_filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(local_filename)
html.close()

However, I would recommend not using urlretrieve, for two reasons.

  1. As the documentation mentions, urlretrieve is a legacy method that may become deprecated. If you're going to use urllib, use the urlopen method instead.

  2. However, as Paul Becotte mentioned in an answer to your other question: if you're looking to fetch URLs, I would recommend installing and using Requests instead of urllib. It's more user-friendly.

Regardless of which method you choose, again, strings are fine. Here's code that that uses Requests to download each of the specified spreadsheets to your current directory:

import requests

base_url = 'https://Here is a URL/'

for num in range(17):
    filename = f'P20{num:02}-{num+1:02}.xls'
    xls = requests.get(base_url + filename)
    with open(filename, 'wb') as f:
        f.write(xls.content)
CrazyChucky
  • 3,263
  • 4
  • 11
  • 25
0

You are overriding the results of the URL with final URL. you need to maintain a final list and keep adding new values to the list

import urllib.parse
url=[];
for i,j in zip(range(0,17),range(1,18)):
    if(i<8 or j<10):
        url.append("https://Here is a URL/P200{}".format(i)+"-0{}".format(j)+".xls")
    if(i==9 and  j==10):
        url.append("https://Here is a URL/P200{}".format(i)+"-{}".format(j)+".xls") 
    if(i>9):
        if((i>9) or (j<8)):
            url.append("https://Here is a URL/P20{}".format(i)+"-{}".format(j)+".xls")

for urlValue in url:
            print(urllib.parse.quote(urlValue))
sunilbaba
  • 441
  • 2
  • 9