0

I need the entire web page source for scraping, but I'm getting only a part of it.

Code trials:

options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')

driver = webdriver.Chrome(options=options)
driver.get(url)

time.sleep(10)

page = driver.page_source
driver.quit()
soup = BeautifulSoup(page, 'html5lib')

return soup

The website is: https://superbet.ro/pariuri-sportive/fotbal/live

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352

2 Answers2

0

the may issue occurs because of the presence of <iframe> you would need to switch between iframes in order to get the corresponding data.

This might help: Switching into second iframe in Selenium Python3

196138105
  • 1
  • 1
0

To extract the page source you need to:

  • Click on OK button to accept the cookies.

  • Induce WebDriverWait for the visibility of an WebElement using visibility_of_element_located().

  • You can use either of the following Locator Strategies:

    • Using CSS_SELECTOR:

      driver.get("https://superbet.ro/pariuri-sportive/fotbal/live")
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#CybotCookiebotDialogBodyLevelButtonAccept[href]"))).click()
      WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.section-header__title")))
      
    • Using XPATH:

      driver.get("https://superbet.ro/pariuri-sportive/fotbal/live")
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='CybotCookiebotDialogBodyLevelButtonAccept' and @href]"))).click()
      WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='section-header__title']")))
      
  • Console Output:

<html lang="en" style="--vh:6.13px;">

<head>
  <meta charset="utf-8">
  <meta name="description" content="">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width,initial-scale=1,user-scalable=0">
  <title>Superbet | Pariuri Sportive Online, Live, Casino, Loto, Virtuale</title>
  <script type="text/javascript" charset="UTF-8" async="" src="https://consentcdn.cookiebot.com/consentconfig/a438e411-35ff-432b-863f-3d25bed37901/state.js"></script>
  <script type="text/javascript" charset="UTF-8" async="" src="https://consent.cookiebot.com/logconsent.ashx?action=accept&amp;nocache=1612037844156&amp;referer=https%3A%2F%2Fsuperbet.ro%2Fpariuri-sportive%2Ffotbal%2Flive&amp;dnt=false&amp;method=strict&amp;clp=true&amp;cls=true&amp;clm=true&amp;cbid=a438e411-35ff-432b-863f-3d25bed37901&amp;cbt=leveloptin&amp;hasdata=true"></script>
  <script type="text/javascript" charset="UTF-8" async="" src="https://consent.cookiebot.com/a438e411-35ff-432b-863f-3d25bed37901/cc.js?renew=false&amp;referer=superbet.ro&amp;dnt=false&amp;forceshow=false&amp;cbid=a438e411-35ff-432b-863f-3d25bed37901&amp;whitelabel=false&amp;brandid=CookieConsent&amp;framework="></script>
  <script type="text/javascript" async="" src="https://consent.cookiebot.com/uc.js?cbid=a438e411-35ff-432b-863f-3d25bed37901"></script>
  <script async="" src="https://www.googletagmanager.com/gtm.js?id=GTM-MN5RWMH"></script>
  <script>
    if (!window.location.hostname.includes('local')) {
      window.dataLayer = window.dataLayer || [];
      window.dataLayer.push({
        originalLocation: document.location.protocol + '//' +
          document.location.hostname +
          document.location.pathname +
          document.location.search
      });
      (function(w, d, s, l, i) {
        w[l] = w[l] || [];
        w[l].push({
          'gtm.start': new Date().getTime(),
          event: 'gtm.js'
        });
        var f = d.getElementsByTagName(s)[0],
          j = d.createElement(s),
          dl = l != 'dataLayer' ? '&l=' + l : '';
        j.async = true;
        j.src = 'https://www.googletagmanager.com/gtm.js?id=' + i + dl;
        f.parentNode.insertBefore(j, f);
      })(window, document, 'script', 'dataLayer', 'GTM-MN5RWMH');
    }
  </script>
  . 
  . 
  .
  <iframe data-product="web_widget" title="No content" tabindex="-1" aria-hidden="true" src="about:blank" style="width: 0px; height: 0px; border: 0px; position: absolute; top: -9999px;"></iframe><iframe name="__uspapiLocator" tabindex="-1" role="presentation"
    aria-hidden="true" title="Blank" style="display: none; position: absolute; width: 1px; height: 1px; top: -9999px;"></iframe><iframe tabindex="-1" role="presentation" aria-hidden="true" title="Blank" src="https://consentcdn.cookiebot.com/sdk/bc-v2.min.html"
    style="position: absolute; width: 1px; height: 1px; top: -9999px;"></iframe>
  <div><iframe title="Deschide o miniaplicație widget unde puteți găsi mai multe informații" id="launcher" tabindex="-1" style="width: 142px; height: 50px; padding: 0px; margin: 10px 20px; position: fixed; bottom: 30px; overflow: visible; opacity: 0; border: 0px; z-index: 999998; transition-duration: 250ms; transition-timing-function: cubic-bezier(0.645, 0.045, 0.355, 1); transition-property: opacity, top, bottom; top: -9999px; visibility: hidden;"></iframe>
    <iframe
      title="Găsiți mai multe informații aici" id="webWidget" tabindex="-1" style="width: 374px; max-height: calc(100vh - 32px); height: 572px; position: fixed; opacity: 0; border: 0px; transition-duration: 250ms; transition-timing-function: cubic-bezier(0.645, 0.045, 0.355, 1); transition-property: opacity, top, bottom; top: -9999px; visibility: hidden; z-index: 999999;"></iframe>
  </div>

  </body>

</html>

References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352