Creating URLs in a loop

Question

I am trying to create a list of URLs using a for loop. It prints all the correct URLs, but is not saving them in a list. Ultimately I want to download multiple files using urlretrieve.

for i, j in zip(range(0, 17), range(1, 18)):
    if i < 8 or j < 10:
        url = "https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"
        print(url)
    if i == 9 and j == 10:
        url = "https://Here is a URL/P200{}".format(i) + "-{}".format(j) + ".xls"
        print(url)
    if i > 9:
        if i > 9 or j < 8:
            url = "https://Here is a URL/P20{}".format(i) + "-{}".format(j) + ".xls"
            print(url)

Output of above code is:

https://Here is a URL/P2000-01.xls
https://Here is a URL/P2001-02.xls
https://Here is a URL/P2002-03.xls
https://Here is a URL/P2003-04.xls
https://Here is a URL/P2004-05.xls
https://Here is a URL/P2005-06.xls
https://Here is a URL/P2006-07.xls
https://Here is a URL/P2007-08.xls
https://Here is a URL/P2008-09.xls
https://Here is a URL/P2009-10.xls
https://Here is a URL/P2010-11.xls
https://Here is a URL/P2011-12.xls
https://Here is a URL/P2012-13.xls
https://Here is a URL/P2013-14.xls
https://Here is a URL/P2014-15.xls
https://Here is a URL/P2015-16.xls
https://Here is a URL/P2016-17.xls

But this:

url

gives only:

'https://Here is a URL/P2016-17.xls'

How do I get all the URLs, not just the final one?

you are replacing url variable everytime with new value. If you want to preserve values, you need to create a list and assign values to the list — sunilbaba, Jan 22 '21 at 10:24
@sunilbaba I have been trying to downlaod the data. Till now the website has data till 2016-17 but when the data for 2017-18 also comes the URL list will not be able to collect the the data automatically, Goal is to automate the script so as to reduce the manual effort — FinickyBee, Jan 22 '21 at 10:28
If you reduce your range, you will restrict your URL formation till 2016-17 itself — sunilbaba, Jan 22 '21 at 10:30

CrazyChucky · Accepted Answer · 2023-01-23T04:52:33.863

There are several things that could significantly simplify your code. First of all, this:

"https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"

could be simplified to this:

"https://Here is a URL/P200{}-0{}.xls".format(i, j)

And if you have at least Python 3.6, you could use an f-string instead:

f"https://Here is a URL/P200{i}-0{j}.xls"

Second of all, Python has several ways to conveniently pad numbers with zeroes; it can even be done as part of the f-string formatting. Additionally, range starts from zero by default.

So your entire original code is equivalent to:

for num in range(17):
    print(f'https://Here is a URL/P20{num:02}-{num+1:02}.xls')

Now, you want to actually use these URLs, not just print them out. You mentioned building a list, which can be done like so:

urls = []
for num in range(17):
    urls.append(f'https://Here is a URL/P20{num:02}-{num+1:02}.xls')

Or with a list comprehension:

urls = [f'https://Here is a URL/P20{num:02}-{num+1:02}.xls'
        for num in range(17)]

Based on your comments here and on your other question, you seem to be confused about what form you need these URLs to be in. Strings like this are already what you need. urlretrieve accepts the URL as a string, so you don't need to do any further processing. See the example in the docs:

local_filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(local_filename)
html.close()

However, I would recommend not using urlretrieve, for two reasons.

As the documentation mentions, urlretrieve is a legacy method that may become deprecated. If you're going to use urllib, use the urlopen method instead.
However, as Paul Becotte mentioned in an answer to your other question: if you're looking to fetch URLs, I would recommend installing and using Requests instead of urllib. It's more user-friendly.

Regardless of which method you choose, again, strings are fine. Here's code that that uses Requests to download each of the specified spreadsheets to your current directory:

import requests

base_url = 'https://Here is a URL/'

for num in range(17):
    filename = f'P20{num:02}-{num+1:02}.xls'
    xls = requests.get(base_url + filename)
    with open(filename, 'wb') as f:
        f.write(xls.content)

sunilbaba · Answer 2 · 2021-01-22T11:38:00.903

0

You are overriding the results of the URL with final URL. you need to maintain a final list and keep adding new values to the list

import urllib.parse
url=[];
for i,j in zip(range(0,17),range(1,18)):
    if(i<8 or j<10):
        url.append("https://Here is a URL/P200{}".format(i)+"-0{}".format(j)+".xls")
    if(i==9 and  j==10):
        url.append("https://Here is a URL/P200{}".format(i)+"-{}".format(j)+".xls") 
    if(i>9):
        if((i>9) or (j<8)):
            url.append("https://Here is a URL/P20{}".format(i)+"-{}".format(j)+".xls")

for urlValue in url:
            print(urllib.parse.quote(urlValue))

edited Jan 22 '21 at 11:38

answered Jan 22 '21 at 10:27

sunilbaba

441
2
9

Thanks it did create a list but did not remain in link form therefore can not be used to download – FinickyBee Jan 22 '21 at 10:49
you can use urllib to convert string value back to URL format. import urllib urllib.quote() – sunilbaba Jan 22 '21 at 10:58
I have tried urllib.quote(url), urllib.parse.quote(url), urllib.parse.urlencoder(url),,, all three of them not working! – FinickyBee Jan 22 '21 at 11:23

Creating URLs in a loop

2 Answers2

Linked