0

I have a page where I must login to get the page I would like to scrape using BeautifulSoup. My code currently looks like

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox();
//loginpage is the page where I have to login. It is just used as a placeholder for this question
driver.get("loginpage");
driver.find_element_by_id("username").send_keys("username");
driver.find_element_by_id("password").send_keys("password");
driver.find_element_by_xpath("//button[@onclick=\"return validateFields();\"]").click();
//contentpage is where I get the content to scrape from. It is also just used as a placeholder for this question.
driver.get("contentpage");
html = driver.page_source;
soup = BeautifulSoup(html, features="lxml");
status = soup.find_all("span");
for status in status:
    print(status);

But I think that the HTML is the wrong page because BeautifulSoup is returning NoneType when I can look in the browser and see that it should be there.

1 Answers1

0

Once you invoke get() and before you extract the page_source you need to induce WebDriverWait for the visibility_of_element_located() of any of the visible element and you can use the following Locator Strategy:

driver.get("contentpage")
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "css_of_a-visible_element")))
html = driver.page_source

As an alternative, you can also use document.documentElement.outerHTML as follows:

driver.get("contentpage")
WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "css_of_a-visible_element")))
html = driver.execute_script("return document.documentElement.outerHTML")

References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352