0

Why can't I use selenium to enter this website "https://ir.baidu.com/static-files/b7b97cfa-f30b-48ba-8702-e5edb2767b21", which is Baidu's third quarter financial result? my code is basically:

browser = webdriver.Chrome()

browser.get("https://ir.baidu.com/financial-reports")

download_link = browser.find_element(By.XPATH, '//*[@title="BIDU 3Q22 ER v2.pdf"]')

download_link.click()

Below is what I am seeing after the "download_link.click()" line

enter image description here

Jason
  • 103
  • 2
  • 7
  • You haven't said what **the actual problem** is. Does the site not load? Does the browser freeze? Do you get an error message? – John Gordon Apr 30 '23 at 18:10
  • Hi John Gordon sorry about that, I have added in a screenshot of the problem I am seeing. The issue here is that my browser seems to be unable to load the webpage after the "download_link.click()" line – Jason May 01 '23 at 01:23

1 Answers1

0

Your question does not suffice the exact root cause, based on the ask I have one query.

Where exactly the code is stuck/not working?

  1. Is it stuck at Page loading -> We need to analize if we are able to hit the URL within the session period and code does not went through.
  2. Is it stuck before DOM loaded -> We need to verify our code should reach the URL and setup the testbed for next operation once DOM is perfectly loaded and visible.
  3. Is it stuck after DOM loaded but page empty (not possible - taking it for if API issue) -> We need to check console logs what happens in the background if page is empty.

Most of the web apps are using AJAX techniques. When a page is loaded by the browser, the elements within that page may load at different time intervals. This makes locating elements difficult: if an element is not yet present in the DOM, a locate function will raise an ElementNotVisibleException exception for this we need to implement the code with try-catch-finally.

Most probable scenario from your code-snippet, it looks similar to the case-2 (upto my understanding).

Solution:

We need to make sure our code reach step-by-step on each operation it performs (since its not human) we need to instruct, observe and prevent the automation glitchies (@ max extent).

browser = webdriver.Chrome()
browser.get("https://ir.baidu.com/financial-reports")

Above part is ok!, we need to make sure of below things once we executed above successfully.

  • Page should get loaded successfully.
  • Page should have the Data visible.
  • Page should have the element which we're searching is visible/enabled for click.
  • Execute/Perform the Click on Element.

And for acheiving above steps we have two major Waits under Selenium Webdrive.

Using waits, we can solve this issue. Waiting provides some slack between actions performed - mostly locating an element or any other operation with the element.

# Initialise Explicit wait with 30 secs timeout
# This will waits till 30 secs for element and raises exceptions on failure if not visible or found.
wait = WebDriverWait(browser, 30)

For verifying financial-reports page is loaded successfully

Construct XPATH / CSS for searching main page element by any relavent tags ID, CLASS, Title, Text, etc.

# Constructed xpath for the main data table for financial-report
page_xpath = '//*[contains(text(), "Download PDF")]/parent::div//following-sibling::div[@class="nir-widget--content"]'

# Wait for 30 secs to get page loaded
wait.until(EC.presence_of_element_located((By.XPATH, page_xpath)))

Same do it for the PDF element where you want to perform the click.

Combined Code Snippet for example: (solution)

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

browser = webdriver.Chrome()
browser.get("https://ir.baidu.com/financial-reports")

# Initialise Explicit wait with 30 secs timeout
# This will waits till 30 secs for element and raises exceptions on failure if not visible or found
wait = WebDriverWait(browser, 30)

try:
  # Constructed xpath for the main data table for financial-report
  page_xpath = '//*[contains(text(), "Download PDF")]/parent::div//following-sibling::div[@class="nir-widget--content"]'

  # Wait for 30 secs to get page loaded
  wait.until(EC.presence_of_element_located((By.XPATH, page_xpath)))

  # Once page is loaded search and wait for element where we need to click.
  q3_report_xpath = '//*[contains(text(), "2022")]/ancestor::article//*[@class="file-link"]//a[contains(@title, "BIDU 3Q22 ER v2.pdf")]'

  # Wait till 30 secs to get Q3 Report element to be clickable or visible.
  element = wait.until(EC.element_to_be_clickable((By.XPATH, q3_report_xpath)))

  # Perform click on element
  element.click()
finally:
  browser.quit()

Note: Moreover, you can also enhance the code by constructing CSS/XPATH more dynamically instead of searching for exact PDF names.

Riyaz Khan
  • 55
  • 7
  • Hi Riyaz Khan, I tried your steps but it does not work – Jason May 01 '23 at 01:29
  • @Jason, I've looked into the screenshot. Can you able to log the network logs when a click is processed? Seems like its an BE failure, but its more over the waits (I believe). Moreover, can you place a sleep after click is processed? just to ensure we are waiting for API to get executed in that time-frame. Also, we have Synchronous and Async requests that needs to manage on our end. So far I'm seeing below error after processing click. `caught (in promise) Error: A listener indicated an asynchronous response by returning true, but the message channel closed before a response was received` – Riyaz Khan May 03 '23 at 13:38
  • Also, take a look at [this](https://stackoverflow.com/questions/43149534/selenium-webdriver-how-to-download-a-pdf-file-with-python) might helps. – Riyaz Khan May 03 '23 at 13:55