0

I aim to download web files while in headless mode. My program downloads perfectly when NOT in headless mode, but once I add the constraint not to show MS Edge opening, the downloading is disregarded.

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select

driver = webdriver.Edge()
driver.get("URL")

id_box = driver.find_element(By.ID,"...")
pw_box = driver.find_element(By.ID,"...")
id_box.send_keys("...")
pw_box.send_keys("...")
log_in = driver.find_element(By.ID,"...")
log_in.click()

time.sleep(0.1) # If not included, get error: "Unable to locate element"

drop_period = Select(driver.find_element(By.ID,"..."))
drop_period.select_by_index(1)
drop_consul = Select(driver.find_element(By.ID,"..."))
drop_consul.select_by_visible_text("...")
drop_client = Select(driver.find_element(By.ID,"..."))
drop_client.select_by_index(1)

# Following files do not download with headless inculded:

driver.find_element(By.XPATH, "...").click()
driver.find_element(By.XPATH, "...").click()


Erik
  • 33
  • 6

4 Answers4

0

In that case, you might try downloading the file using the direct link (to the file) and python requests.

You'll need to get the url, by parsing the elemt its href:

Downloading and saving a file from url should work as following then:

import requests as req

remote_url = 'http://www.example.com/file.txt'
local_file_name = 'my_file.txt'

data = req.get(remote_url)

# Save file data to local copy
with open(local_file_name, 'wb')as file:
    file.write(data.content)

resource

kaliiiiiiiii
  • 925
  • 1
  • 2
  • 21
  • I can't find a URL for the file download, only to the website. The file download only appears after I have selected drop down lists, and the page automatically update. – Erik Jan 20 '23 at 10:47
  • Can you add the page_source or url to the website, and the desired button element? – kaliiiiiiiii Jan 20 '23 at 10:50
  • I can share the URL: https://interactive.advantage.am/Portal/Secure/Login.aspx?ReturnUrl=%2fAlphaAccess%2fStatements – Erik Jan 20 '23 at 11:16
  • However, information I download on the webpage is discretionary, I can not provide access to that page. – Erik Jan 20 '23 at 11:17
  • Hmm I think there should be a way to identify the direct link. You might try using the network (browser-developer-tools) tab to get the url to the file-request. – kaliiiiiiiii Jan 20 '23 at 11:24
  • And then, check what gets sent with it. I assume there's gonnam be some kind of verification like cookie or auth-token. – kaliiiiiiiii Jan 20 '23 at 11:25
0

There are different headless modes for Chrome. If you want to download files, use one of the special ones.

For Chrome 109 and above, use:

options.add_argument("--headless=new")

For Chrome 108 and below, use:

options.add_argument("--headless=chrome")

Reference: https://github.com/chromium/chromium/commit/e9c516118e2e1923757ecb13e6d9fff36775d1f4

Michael Mintz
  • 9,007
  • 6
  • 31
  • 48
0

Downloading files in headless mode works for me on MicrosoftEdge version 110.0.1587.41 using following options:

    MicrosoftEdge: [{
        "browserName": "MicrosoftEdge",
        "ms:edgeOptions": {
            args: ['--headless=new'],
            prefs: {
                "download.prompt_for_download": false,
                "plugins.always_open_pdf_externally": true,
                'download.default_directory': "dlFolder"
            }
        },
    }]

Nothing worked until I added the option '--headless=new'

N.B: Tested on a Mac environment using webdriverIO

Badufz
  • 1
  • 1
0

The options.add_argument("headless=new") syntax also works for Edge.

I previously used the following syntax to open Edge in headless mode:

from selenium import webdriver
from selenium.webdriver.edge.options import Options

options = Options()
options.add_experimental_option("prefs", {"download.default_directory": my_download_folder, "download.prompt_for_download": False, 'profile.default_content_settings.popups': False})     
options.add_experimental_option("excludeSwitches", ["enable-logging"])
options.add_argument('log-level=3') 
options.headless = True
browser = webdriver.Edge(options=options)
browser.get(url)

The above still works fine (opens the browser in headless mode, clicks links, etc), but doesn't allow file downloads. (You can click on a download link, but nothing happens). New syntax fixes this issue:

from selenium import webdriver
from selenium.webdriver.edge.options import Options

options = Options()
options.add_experimental_option("prefs", {"download.default_directory": my_download_folder, "download.prompt_for_download": False, 'profile.default_content_settings.popups': False})     
options.add_experimental_option("excludeSwitches", ["enable-logging"])
options.add_argument('log-level=3') 
options.add_argument("headless=new")
browser = webdriver.Edge(options=options)
browser.get(url)