0

I am scraping a website for information pertaining to products. I need to log in to access the products I have access to. After successfully logging in and navigating to the product details page, Selenium returns nothing. I have been trying for the past week and nothing worked out.

So, I was wondering if I can try BeautifulSoup to get the text that I want after reaching that point?

Is it doable? Please reccomend any resources/readings.

quiteLost24
  • 35
  • 1
  • 6
  • 1
    https://stackoverflow.com/questions/13960326/how-can-i-parse-a-website-using-selenium-and-beautifulsoup-in-python, https://stackoverflow.com/questions/55197425/navigate-with-selenium-and-scrape-with-beautifulsoup-in-python, https://stackoverflow.com/questions/62475675/how-to-scrape-hidden-class-data-using-selenium-and-beautiful-soup, https://stackoverflow.com/questions/14529849/python-scraping-javascript-using-selenium-and-beautiful-soup – vitaliis Jun 21 '21 at 16:59
  • 1
    Does this answer your question? [How can I parse a website using Selenium and Beautifulsoup in python?](https://stackoverflow.com/questions/13960326/how-can-i-parse-a-website-using-selenium-and-beautifulsoup-in-python) – vitaliis Jun 21 '21 at 16:59

3 Answers3

2

Sure, the HTML of the page is available within Selenium using the .page_source property, you can pass the HTML to BeautifulSoup to parse it.

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get(url)
...
soup = BeautifulSoup(driver.page_source, "html.parser")
MendelG
  • 14,885
  • 4
  • 25
  • 52
0

Yes very much, You can integrate both of them Beautiful soup and Selenium

Selenium is for automating browser which means UI

and Beautiful soup is a HTML parser.

But Selenium is quite powerful, you should be able to continue with UI.

Selenium - Official docs

BeautifulSoup - Official docs

cruisepandey
  • 28,520
  • 6
  • 20
  • 38
0

use driver.get(url) to open url which is found by bs4 (convert from bs4 to selenium)

use BeautifulSoup(driver.page_source, "html.parser") to crawl all page which is opened by selenium (convert from selenium to bs4)

but this is a problem I have found for a long time: use BeautifulSoup(element.get_attribute('innerHTML'), "html.parser") to crawl element which is found by selenium (convert from selenium to bs4)

lam vu Nguyen
  • 433
  • 4
  • 9