2

My code is very simple: click on a href link to download a file. It works fine until I add the headless argument, then clicking it doesn't do anything. Unsure whether this is a Selenium issue or a Chromedriver issue? None of the solutions I've found online have been helpful, so any suggestions would be appreciated. Here's my code:

import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options


class Scraper(object):

    def __init__(self, cursor):
        self.driver = None

    def create_driver(self):
        # Set up Headless Chrome
        chrome_options = Options()
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--start-maximized")
        chrome_options.add_argument("--window-size=1920x1080")
        self.driver = webdriver.Chrome(executable_path=os.path.abspath("path to chromedriver"),
                                   chrome_options=chrome_options)
        self.driver.maximize_window()

    def go_to_website(self):
        self.driver.get('https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/6202.0Nov%202019?OpenDocument')
        link_to_click = self.driver.find_element_by_xpath("//a[contains(@href,'/log?openagent&6202012.xls&6202.0')]")
        link_to_click.click()

    def run(self):
        # set a new driver
        self.create_driver()
        self.go_to_website()
Guy
  • 46,488
  • 10
  • 44
  • 88
Uncle_Timothy
  • 101
  • 1
  • 2
  • 10

2 Answers2

2

If your usecase is to click the .xls element for the element with text as ...Table 12. Labour force status by Sex, State and Territory - Trend, Seasonally adjusted and Original... you to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#details tbody tr:nth-of-type(13) td>a>img"))).click()
    
  • Using XPATH:

    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//tr[@class='listentry']/td[contains(., 'Labour force status by Sex, State and Territory - Trend, Seasonally adjusted and Original')]//following::td[1]/a/img"))).click()
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Update

However being able to click on the element through the Locator Strategies maynot initiate the download. To initiate the downloading using you have to configure Page.setDownloadBehavior through execute_cdp_cmd() and you can find a detailed discussion in Download file through Google Chrome in headless mode

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
1

You have to specify download path when using headless mode in chromedriver. Also you have to wait until file will be downloaded. In code below you can find simple example how to wait file to be downloaded. I used regex to get name of the file.

import os

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re

download_path = "your_download_path"

options = webdriver.ChromeOptions()
prefs = {
    "profile.default_content_settings.popups": 0,
    "download.prompt_for_download": False,
    "download.directory_upgrade ": True,
    'download.default_directory': download_path,
}
options.add_experimental_option('prefs', prefs)
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.headless = True
driver = webdriver.Chrome(options=options)

driver.set_window_size(1920, 1080)
driver.maximize_window()

wait = WebDriverWait(driver, 10)

spreadsheet_name = "Table 12. Labour force status by Sex, State and Territory - Trend, Seasonally adjusted and Original"
excel_xpath = f"//tr[contains(., '{spreadsheet_name}') and @class='listentry']//a[./img[contains(@alt, 'Excel')]]"

with driver:
    driver.get('https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/6202.0Nov%202019?OpenDocument')

    download_button = wait.until(EC.element_to_be_clickable((By.XPATH, excel_xpath)))
    href = download_button.get_attribute("href")

    # href of the file
    # https://www.abs.gov.au/ausstats/meisubs.nsf/log?openagent&6202012.xls&6202.0&Time%20Series%20Spreadsheet&053D25DD395DF901CA2584D4001C70A5&0&Nov%202019&19.12.2019&Latest"
    file_name = re.findall(r"(?<=openagent&)(.*?)(?=&)", href)[0]

    download_button.click()

    for i in range(60):
        if not os.path.exists(f"{download_path}/{file_name}"):
            time.sleep(1)

    if not os.path.exists(f"{download_path}/{file_name}"):
        print("Failed to download", file_name, href)
Sers
  • 12,047
  • 2
  • 12
  • 31