1

If I open the link: https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip

This link shows the window and I need to press the OK button and it downloads the file.

The alert is not from the browser, it is from the page itself.

But When I tried the script:

from io import BytesIO
from zipfile import ZipFile
import requests


def get_zip(file_url):
    url = requests.get(file_url)
    zipfile = ZipFile(BytesIO(url.content))
    zipfile.extractall("")

file_link ='https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip'

get_zip(file_link)

This throws the error:

zipfile.BadZipFile: File is not a zip file

And when I tried:

import requests

url = r'https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip'
output = r'downloadedfile.zip'

r = requests.get(url)
with open(output, 'wb') as f:
    f.write(r.content)

This downloads the content of the page showing the OK button. Any idea how to solve this:, the link downloads the zip file.

Atom Store
  • 961
  • 1
  • 11
  • 35
  • https://svaderia.github.io/articles/downloading-and-unzipping-a-zipfile/ – drum Jul 26 '21 at 03:35
  • 1
    Can you refer to the second answer [here](https://stackoverflow.com/questions/9419162/download-returned-zip-file-from-url) – Abhishek Prajapat Jul 26 '21 at 03:36
  • You cannot open *https://www.x.com/ca.zip* because of an invalid SSL certificate and if you try *http://www.x.com/ca.zip* you will indeed get `zipfile.BadZipFile: File is not a zip file` because `requests.get(file_url)` returns a 404 Not Found error. See the comment offered by @AbhishekPrajapat., which is what your code seems to be doing already except your call to `extractall` needs a better path specification. – Booboo Jul 28 '21 at 10:29
  • x.com is a random website it can be anything, the major problem is to bypass the alert, – Atom Store Jul 28 '21 at 11:15
  • The site you refer to must have a customized way of downloading files. Either you reverse-engineer that or you emulate a browser with something like Selenium WebDriver or Puppetter. As it currently stands, this question lacks the details necessary to answer it. – Marco Bonelli Jul 28 '21 at 20:27
  • 2
    Are you okay with a selenium based answer? – Red Jul 29 '21 at 12:52
  • 1
    @Ann Zen yeah perfectly fine with selenium answer. – Atom Store Jul 30 '21 at 03:10
  • @MarcoBonelli I cannot exactly mention the website due to security reasons, I have stated that it can be any website and it shows an HTML page with a button after clicking it, it downloads the zip file. – Atom Store Jul 30 '21 at 03:11
  • @AtomStore well, I wish it was that simple, but every website does this in a different way. It's impossible to answer your question without knowing what the specific website you refer to does. It's like saying "I need new windshield wipers for my car" without saying the model of the car. – Marco Bonelli Jul 30 '21 at 03:15
  • @AtomStore Cool. Please provide a link from a known domain that would reproduce the same error you got for us to test with. – Red Jul 30 '21 at 03:48
  • Your question needs additional information to solve. Please provide the URL that is giving you a problem. – Life is complex Aug 01 '21 at 02:13
  • The example of the link is : https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip – Atom Store Aug 02 '21 at 03:10

1 Answers1

1

I believe you are accepting answer using selenium, Here's what you can do using selenium :

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

profile = webdriver.FirefoxProfile()

profile.set_preference("browser.download.folderList",1)
# 0 for desktop
# 1 for default download folder
# 2 for specific folder 
# You can specify directory by using profile.set_preference("browser.download.dir","<>")

profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream")
profile.set_preference("browser.helperApps.alwaysAsk.force", False);

# If you don't have some download manager then you can remove these
profile.set_preference("browser.download.manager.showWhenStarting",False)
profile.set_preference("browser.download.manager.useWindow", False);
profile.set_preference("browser.download.manager.focusWhenStarting", False);
profile.set_preference("browser.download.manager.alertOnEXEOpen", False);
profile.set_preference("browser.download.manager.showAlertOnComplete", False);

driver=webdriver.Firefox(firefox_profile=profile,executable_path="<>")

driver.get("https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip")
driver.find_element_by_id("butAgree").click()

Here we are setting some profiles to disable pop out, download dialog.

It is working perfectly fine in latest version of Firefox and 3.141.0 version of selenium

imxitiz
  • 3,920
  • 3
  • 9
  • 33
  • 1
    You can learn more about `application/octet-stream` from [here](https://kb.iu.edu/d/agtj). The file was zip file but in response header of that download file, it is sending `Content-Type` as `application/octet-stream` – imxitiz Aug 02 '21 at 05:59
  • 1
    the code opens in the firefox browser but I want the file to be downloaded or opened which is not being done.... – Atom Store Aug 02 '21 at 09:30
  • 1
    Doesn't this script download that file for you? @AtomStore what do you mean by _the code opens in the firefox browser_? What did you except `selenium` to do? – imxitiz Aug 02 '21 at 09:45
  • 1
    You have checked in download folder, right? Which OS, selenium version, browser? – imxitiz Aug 02 '21 at 09:49
  • this opens the browser and downloads the file.. But this is not suitable while running from the server and also cronjobbing this script is also not running – Atom Store Aug 30 '21 at 10:29
  • @AtomStore I am not understanding, what you are saying! It isn't worth to ask new question, right? – imxitiz Aug 30 '21 at 11:52