1

I want to download a file from a webpage. That webpage only has one .zip file (that's what I want to download), but when I click on the .zip file, it starts download but the URL doesn't change (the URL still remains of the form http://ldn2800:8080/id=2800). How can I download this using python, considering that there is no URL of the form http://example.com/1.zip?

Also, when I directly go to the page http://ldn2800:8080/id=2800, it just opens that page with the .zip file but doesn't download it without clicking. How do download it using python?

UPDATE: Right now I'm doing it this way:

if (str(dict.get('id')) == winID):
            #or str(dict.get('id')) == linuxID):
            #if str(dict.get('number')) == buildNo:
            buildTypeId = dict.get('id')
            ID = dict.get('id')
            downloadURL = "http://example:8080/viewType.html?buildId=26009&tab=artifacts&buildTypeId=" + ID
            directory = BindingsDest + "\\" + buildNo
            if not os.path.exists(directory):
                os.makedirs(directory)
    
            fileName = None
            if buildTypeId == linuxID:
                fileName = linuxLib + "-" + buildNo + ".zip"
            elif buildTypeId == winID:
                fileName = winLib + "-" + buildNo + ".zip"
    
            if fileName is not None:
                print(dict) 
                downloadFile(downloadURL, directory, fileName)

def downloadFile(downloadURL, directory, fileName, user=user, password=password):
    if user is not None and password is not None:
        request = requests.get(downloadURL, stream=True, auth=(user, password))
    else:
        request = requests.get(downloadURL, stream=True)

    with open(directory + "\\" + fileName, 'wb') as handle:
        for block in request.iter_content(1024):
            if not block:
                break
            handle.write(block)

But, it just creates a zip in the required location but that zip can't be opened and has nothing. Can something like this be done: like searching for the filename on the webpage and then download that pattern matched?

bad_coder
  • 11,289
  • 20
  • 44
  • 72
Arshad
  • 51
  • 6
  • This should help you: http://stackoverflow.com/questions/11002014/downloading-file-with-python-mechanize – mechanical_meat Jul 21 '16 at 14:59
  • Have you tried executing the request using python? What happens? – Marco Acierno Jul 21 '16 at 15:00
  • @MarcoAcierno I have updated my question showiing what I'm doing right now. – Arshad Jul 21 '16 at 15:52
  • 1
    @bernie I can't figure how to use mechanize for my case. I have updated the question to show what I'm doing right now and what's happening. – Arshad Jul 21 '16 at 15:53
  • Using your code I am able to download a file. – mechanical_meat Jul 21 '16 at 16:19
  • You mean where the url remains the same as in my case? Just reiterating that when if I open the above url in the browser, the webpage opens which has one zip file, but doesn't start downloading on its own. I still need to click on it to download. – Arshad Jul 21 '16 at 16:30
  • I can't actually open that link so I used a different link: https://notepad-plus-plus.org/repository/6.x/6.9.2/npp.6.9.2.bin.minimalist.7z – mechanical_meat Jul 21 '16 at 16:33
  • I don't think you understand my problem here. The link that you provided is a .7z one i.e., on clicking on that link a file downloads automatically. However, for my case, the link opens a webpage which has a zip file listed on it, and I need to click on it. It doesn't download on its own. – Arshad Jul 21 '16 at 16:37
  • Oh I see the issue... – mechanical_meat Jul 21 '16 at 16:50

1 Answers1

1

Check the HTTP status code to make sure that no error happened. You may use the builtin method raise_for_status to do so: https://requests.readthedocs.io/en/master/api/#requests.Response.raise_for_status

def downloadFile(downloadURL, directory, fileName, user=user, password=password):
    if user is not None and password is not None:
        request = requests.get(downloadURL, stream=True, auth=(user, password))
    else:
        request = requests.get(downloadURL, stream=True)

    request.raise_for_status()

    with open(directory + "\\" + fileName, 'wb') as handle:
        for block in request.iter_content(1024):
            if not block:
                break
            handle.write(block)

Are you sure that there is no networking issue such as proxy/fw/etc ?

EDIT: according to your above comment, I'm not sure that this answers your actual problem. Revised answer:

You access a web page containing a link to a zip file. This link, you say, is the same as the page itself. But if you click on it in a browser, it downloads the file instead of reaching the HTML page again. That's strange but can be explained in various ways. Please copy/paste the whole HTML page code (including the link to the zip file), that will probably help us understanding the issue.

Guillaume
  • 5,497
  • 3
  • 24
  • 42
  • No error happened. It's a corporate network. Could that be the case? – Arshad Jul 21 '16 at 16:48
  • Corporate network will probably have a mandatory proxy to reach the Internet. Is your target site hosted in the corporate network or outside? – Guillaume Jul 21 '16 at 16:51
  • 1
    Actually, by looking at the html source, I'm able to solve the issue. Thanks a lot for your help. – Arshad Jul 21 '16 at 17:03