2

This question is for Python 3.6.3, bs4 and Selenium 3.8 on Win10.

I am trying to scrape pages with dynamic content. What I am trying to scrape is numbers and text (from http://www.oddsportal.com for example). From my understanding using requests+beautifulsoup will not do the job, as dynamic content will be hidden. So I have to use other tools such us selenium webdriver.

Then, given that I will use selenium webdriver anyway, do you recommend ignoring beautifulsoup and stick with selenium webdriver functions, e.g.

elem = driver.find_element_by_name("q"))

Or is it considered better practice to use selenium+beautifulsoup?

Do you have any opinion as to which of the two routes will give me more convenient functions to work with?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
cmarios
  • 165
  • 1
  • 10

2 Answers2

3

Beautifulsoup

Beautifulsoup is a powerful tool for Web Scraping. It use the urllib.request Python library. urllib.request is quite powerful to extract data from static pages.

Selenium

Selenium is currently the most widely accepted and efficient tool for Web Automation. Selenium supports interacting with Dynamic Pages, Contents and Elements.

Conclusion

To create a robust and efficient framework to scrape pages with dynamic content you must integrate both Selenium and Beautifulsoup in your framework. Browse and interact with dynamic elements through Selenium and scrape the contents efficiently through Beautifulsoup

An Example

Here is an example using Selenium and Beautifulsoup for Scraping

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
1

Selenium has many selectors

find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

# and 

find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector

so mostly you don't need BeautifulSoup.

Especially xpath and css_selector can be useful.

furas
  • 134,197
  • 12
  • 106
  • 148