1

I am trying to print the HTML of https://www.dplay.no/kanaler/ (the webpage is geo restricted so you might have to use https://go.discovery.com/tv-shows/) but it shouldn't matter.

Since the webpage is using JavaScript to load the HTML content I decided to use Selenium with Python 3 to scrape content.

What I have so far is:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://www.dplay.no/kanaler')

html = driver.page_source

print(html)

I have also tried:

html = driver.execute_script("return document.documentElement.outerHTML;")

and

html = driver.execute_script("return document.documentElement.innerHTML;")

However, this does not seem to work because the response I get is not the HTML on the webpage.

How can I get the HTML content that is actually visible on the webpage?

Arete
  • 948
  • 3
  • 21
  • 48

1 Answers1

0

You are seeing the right output and correct behavior.

I took your code and added a few options along with some waits and here is the observation:

  • Code Block:

    from selenium import webdriver
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.dplay.no/kanaler/')
    time.sleep(10)
    print(driver.page_source)
    
  • Console Output:

      <html lang="no"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width,maximum-scale=10,minimum-scale=1,initial-scale=1"><meta name="google" value="notranslate"><title>Strøm kanaler direkte | Dplay</title><link rel="preconnect" href="https://dplay-static.disco-api.com"><link rel="preconnect" href="https://disco-api.dplay.no"><link rel="preconnect" href="https://eu1-prod-images.disco-api.com"><link rel="preconnect" href="https://connect.facebook.net"><link rel="preconnect" href="https://fonts.googleapis.com"><link rel="preconnect" href="https://assets.adobedtm.com"><link rel="preload" as="script" href="/main-1adbd0ca3d3a7141c1a5.js"><meta name="mobile-web-app-capable" content="yes"><link rel="manifest" href="/manifest.json" crossorigin="use-credentials"><link rel="icon" href="/dplay-logo-180.png"><meta name="apple-mobile-web-app-capable" content="yes"><meta name="apple-mobile-web-app-title" content="Dplay"><meta name="apple-mobile-web-app-status-bar-style" content="white"><link rel="apple-touch-icon" href="/dplay-apple-touch-icon.jpg"><link rel="apple-touch-startup-image" href="/dplay-logo-text-180x75.png"><!-- Facebook App link --><meta property="al:ios:url" content="com.discovery.dplay://facebook"><meta property="al:ios:app_store_id" content="KC4ZD2359Y.com.kanal5.play"><meta property="al:ios:app_name" content="Dplay"><meta property="al:android:url" content="com.discovery.dplay://facebook"><meta property="al:android:package" content="no.dplay"><meta property="al:android:app_name" content="Dplay"><script type="text/javascript" async="" src="https://www.googleadservices.com/pagead/conversion_async.js"></script><script type="text/javascript" async="" src="https://www.googleadservices.com/pagead/conversion_async.js"></script><script src="https://secure.quantserve.com/quant.js" async="" type="text/javascript"></script>
      .
      <script src="https://assets.adobedtm.com/479fbb05b9cf/9fc1a3ab6d1b/76543fb834e9/RCea880b60a90b4cb88872a3ecb52c59e0-source.min.js" async=""></script><script src="https://assets.adobedtm.com/479fbb05b9cf/9fc1a3ab6d1b/76543fb834e9/RC5b307908f85d452bbd1cc58e00201436-source.min.js" async=""></script></head><body><div id="app"><div class="pageContainer-1eCorB4H"><div id="header-wrapper" class="sticky-1FwWG4lU"><header class="header-1l1ildAB"><div class="topHeader-zyhEIsC-"><div class="topContainer-21wWp6Os"><a class="link-_ruDcDB7 logoLink-318yvghE" href="/"><img alt="Dplay" class="logo-3IfpM36Y logo-h00c9h56" src="/a08ed345c0fe04696cf31ab3b87100dc.svg"></a><div class="navWrapper-vwKHbhW_"><div class="nav-10tSiGaY"><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/programmer"><div class="navItem-14yB0BB8">Programmer</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/kanaler"><div class="navItem-14yB0BB8">Kanaler</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/tv-guide"><div class="navItem-14yB0BB8">TV-guide</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/sport"><div class="navItem-14yB0BB8">Sport</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/kategorier"><div class="navItem-14yB0BB8">Kategorier</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/gratis"><div class="navItem-14yB0BB8">Gratis</div></a></div><div class="premiumWrapper-3DTdcxSl"><a class="premiumButton-31dbB505" href="/mydplay/products?configName=auth-prod&amp;hostUrl=disco-api.dplay.no&amp;realm=dplayno&amp;returnUrl=https%3A%2F%2Fwww.dplay.no%2Fkanaler%2F" target="_self">Registrer</a></div></div><div class="iconWrapper-3mBB7-5x"><a class="link-ear3kCaw" href="/mydplay/entry/login?configName=auth-prod&amp;hostUrl=disco-api.dplay.no&amp;realm=dplayno&amp;returnUrl=https%3A%2F%2Fwww.dplay.no%2Fkanaler%2F" target="_self"><div class="container-2M8eCiLJ favouritesEnabled-3pfkgJ2m"><span class="label-2g_F1Qvf">Logg inn</span><span class="SVGInline icon-1tqhFCqf icon-hn3OCBQP" style="font-size: 0px;"><svg class="SVGInline-svg icon-1tqhFCqf-svg icon-hn3OCBQP-svg" viewBox="0 0 28 28" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><title>ic_icon_login_default</title><desc>Created with Sketch.</desc><g id="ic_icon_login_default" stroke="none" stroke-width="1"><g id="Login"><rect id="Rectangle" fill="#D8D8D8" opacity="0" x="0" y="0" width="28" height="28"></rect><g id="Group" transform="translate(3.192000, 3.024000)"><path d="M10.7907276,10.976 C5.48646358,10.976 1.06106358,14.738528 0.0376635838,19.740224 C-0.196360416,20.884192 0.690455584,21.952 1.85816758,21.952 L19.7230636,21.952 C20.9033756,21.952 21.7773676,20.865488 21.5375196,19.70976 C20.5024716,14.72324 16.0841836,10.976 10.7907276,10.976 M10.7907276,13.776 C11.7565596,13.776 12.7005516,13.941984 13.5966076,14.269416 C14.4628716,14.585872 15.2652956,15.045296 15.9817036,15.634864 C17.1155916,16.568104 17.9765916,17.79008 18.4745996,19.152 L3.10685558,19.152 C3.60351958,17.793552 4.46115958,16.574544 5.59112758,15.641976 C6.30820758,15.050224 7.11175158,14.589064 7.97947158,14.27132 C8.87709558,13.942656 9.82299158,13.776 10.7907276,13.776" id="Fill-1"></path><circle id="Oval" fill-rule="nonzero" cx="10.808" cy="4.816" r="4.816"></circle></g></g></g></svg></span></div></a>
      .
      <div class="text-1Ey12L6b"><p class="paragraph-3wtxxPuR size2-34rTNEs0">Dplay bruker cookies på nettsiden for å huske dine innstillinger, lage statistikker for å forbedre nettsiden vår, og å gi deg de mest relevante annonsene. Denne informasjonen kan deles med tredjeparter. Ved å fortsette å bruke nettsiden aksepterer du vår bruk av cookies, men du kan når som helst endre denne godkjenningen ved å følge instruksene på vår <a class="" href="https://dplay.no/cookies" rel="noopener" target="_blank">Cookies-side</a>. Her kan du også lese mer om dette</p></div></div><div class="links-2-4rTI9u"></div><button class="button-b4wYudld round-1Ew9jgjq default-vjGITl8z tertiaryCTA-3nF7cF3Z button-2j5j5ldl" type="button"><div class="content-2CZAzoNK"><p class="paragraph-3wtxxPuR text-2iB55dam size3-3bK_JR3k">Ok, jeg aksepterer</p></div></button></div></div><noscript></noscript><noscript></noscript><noscript></noscript><noscript></noscript></dialog><div class="footer-2i64orTD"><footer class="footer-OP_eHgMZ"><div class="container-1KS4F4y4"><div class="base-1JDWzsKS divider-1J9xjEr7"></div><div class="links-3cRELxmJ"><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/brukervilkaar">Brukervilkår</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/personvernpolicy">Personvernpolicy</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="" href="https://dplayhelp.zendesk.com/hc/no" rel="noopener" target="_blank">Kundeservice</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/om-dplay">Om Dplay</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/cookies">Cookies</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/systemkrav">Systemkrav</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="" href="https://presse.discovery.no/" rel="noopener" target="_blank">Presse</a></p></div></div><div class="base-1JDWzsKS divider-1J9xjEr7"></div><div class="logos-2tROKQvT"><a class="link-_ruDcDB7" href="/kanaler/tvnorge"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/16/channel-28-11261681250457276.png?w=108" class="logo-1DS_OQCW" alt="TVNorge"></div></a><a class="link-_ruDcDB7" href="/kanaler/fem"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/16/channel-29-11262316210706002.png?w=108" class="logo-1DS_OQCW" alt="FEM"></div></a><a class="link-_ruDcDB7" href="/kanaler/max"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/16/channel-30-11262268785616804.png?w=108" class="logo-1DS_OQCW" alt="MAX"></div></a><a class="link-_ruDcDB7" href="/kanaler/vox"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/16/channel-31-11261733016544693.png?w=108" class="logo-1DS_OQCW" alt="VOX"></div></a><a class="link-_ruDcDB7" href="/kanaler/discovery"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2019/10/08/channel-45-314717396207329.png?w=108" class="logo-1DS_OQCW" alt="Discovery"></div></a><a class="link-_ruDcDB7" href="/kanaler/animal-planet"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2019/01/22/channel-35-17020156064294169.PNG?w=108" class="logo-1DS_OQCW" alt="Animal Planet"></div></a><a class="link-_ruDcDB7" href="/kanaler/tlc"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/19/channel-15-4230971263537569.png?w=108" class="logo-1DS_OQCW" alt="TLC"></div></a><a class="link-_ruDcDB7" href="/kanaler/id"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/19/channel-73-4230992926516029.png?w=108" class="logo-1DS_OQCW" alt="Investigation Discovery"></div></a><a class="link-_ruDcDB7" href="/kanaler/discovery-science"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2019/10/08/channel-71-314744145281602.png?w=108" class="logo-1DS_OQCW" alt="Discovery Science"></div></a></div><section class="AppStoreLogosWrapper"><div class="base-1JDWzsKS divider-1J9xjEr7"></div></section><div class="copyrightContainer-2T6iDmRy"><p class="paragraph-3wtxxPuR copyright-2F2sRiJ4 size4-V7KSEEpz uppercase-IgQ1hyw0">Copyright © 2019 Discovery, Inc. or its subsidiaries and affiliates. All rights reserved.</p><a class="discoveryLogo-2PuZiJgQ" href="https://corporate.discovery.com/" rel="noopener" target="_blank"><img alt="Dplay" class="logo-3IfpM36Y" src="https://eu1-prod-images.disco-api.com/2019/3/26/35fc368d-4fb8-4c39-84a8-62eb61a8aeff.png"></a></div></div></footer></div></div></div><script>_satellite["__runScript1"](function(event, target) {
    
      try {
    
      var _hj_country_ids = {
        se : "767702",
        no : "767794",
        dk : "767799",
        fi : "1018217",
        jp : "1749918",
        nl : "1749920"
      }
      var _hj_ctry = /([a-z]{2})$/.exec(document.location.host)[0];
    
      if (_hj_country_ids.hasOwnProperty(_hj_ctry)){
        (function(h,o,t,j,a,r){
          h.hj=h.hj||function(){(h.hj.q=h.hj.q||[]).push(arguments)};
          h._hjSettings={hjid:_hj_country_ids[_hj_ctry],hjsv:6};
          a=o.getElementsByTagName('head')[0];
          r=o.createElement('script');r.async=1;
          r.src=t+h._hjSettings.hjid+j+h._hjSettings.hjsv;
          a.appendChild(r);
          })(window,document,'https://static.hotjar.com/c/hotjar-','.js?sv=');
      }
    
    
        } catch (e) {}
    
      });</script><script>_satellite["__runScript2"](function(event, target) {
      try{
    
      if(/no/i.test(_satellite.getVar("Environment:CountryCode"))){
      (function(win, doc, sdk_url){
        if(win.snaptr) return;
        var tr=win.snaptr=function(){
        tr.handleRequest? tr.handleRequest.apply(tr, arguments):tr.queue.push(arguments);
      };
        tr.queue = [];
        var s='script';
        var new_script_section=doc.createElement(s);
        new_script_section.async=!0;
        new_script_section.src=sdk_url;
        var insert_pos=doc.getElementsByTagName(s)[0];
        insert_pos.parentNode.insertBefore(new_script_section, insert_pos);
      })(window, document, 'https://sc-static.net/scevent.min.js');
       snaptr('init','d3df95e4-c2a5-49f3-91ea-1b91fb1a53af')
      }
    
      } catch (e) {}
      });</script><script>_satellite["__runScript3"](function(event, target) {
      try {
        window.dataLayer = window.dataLayer || [];
        window.gtag = function() {
              dataLayer.push(arguments);
          }
          var country_id = {
          no: "UA-57600485-7",
          dk: "UA-57600485-4",
          se: "DC-8313372",
          fi: "AW-797670288",
          jp: "AW-714777410"
          }
          //This should be reworked and generalized, not all pages have the countrycode as top level domain, added else on line 24 please refactor (KN 2019-08-01)
          var pos = document.location.hostname.split(".").length - 1;
          var cc = document.location.hostname.split(".")[pos];
          if (country_id.hasOwnProperty(cc)) {
            if (!document.getElementById('google-analytics-gtag-js')) {
          var script = document.createElement('script');
          script.src = "https://www.googletagmanager.com/gtag/js?id="+country_id[cc];
          script.async = true;
          script.id = "google-analytics-gtag-js"
          document.head.appendChild(script);
          }
          }
          else {
            if (country_id.hasOwnProperty(_satellite.getVar("Environment:CountryCode"))) {
          if (!document.getElementById('google-analytics-gtag-js')) {
            var script = document.createElement('script');
            script.src = "https://www.googletagmanager.com/gtag/js?id="+country_id[_satellite.getVar("Environment:CountryCode")];
            script.async = true;
            script.id = "google-analytics-gtag-js"
            document.head.appendChild(script);
          }
            }
          }
      } catch (e) {}
    
      /////////////////////MSA Nordics Google organic 20200602
      try{
          var cc = _satellite.getVar("Environment:CountryCode")
          if (/no|dk|se|fi/i.test(cc)){
    
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
    
          gtag('config', 'DC-9232428', {
          'dc_natural_search': {
          'exclusion_parameters': ['gclid\x3d*'],
    
                  'engines': {
                  'yahoo': '468297265;273992205;x',
                  'google': '468296951;273980697;k',
                  'aol': '468307456;273972811;s',
                  'ask': '468306601;273972808;p',
                  'msn': '468291560;273653897;a'
                  }
    
          }
    
          })
      }
      } catch (e) {}
      });</script><script>_satellite["__runScript4"](function(event, target) {
      //// Script load
    
      if (!document.getElementById("userreport-launcher-script")) {
        var script = document.createElement("script");
       script.id = "userreport-launcher-script";
        script.src = "https://sak.userreport.com/discovery/launcher.js";
        script.async = true;
        document.head.appendChild(script);
      }
      });</script><iframe sandbox="allow-scripts allow-same-origin" title="Adobe ID Syncing iFrame" id="destination_publishing_iframe_discovery_0" name="destination_publishing_iframe_discovery_0_name" src="https://discovery.demdex.net/dest5.html?d_nsid=0#https%3A%2F%2Fwww.dplay.no" class="aamIframeLoaded" style="display: none; width: 0px; height: 0px;"></iframe></body></html>
    

Conclusion

The website is JavaScript based so you need to wait for the WebElement to render within the DOM Tree before collecting the page_source


References

You can find a couple of relevant discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thanks. Yes this works but is there a more programmatic way to determine if the WebElement is loaded? 10 seconds is arbitrary. – Arete Jul 05 '20 at 12:32
  • @Arete Checkout the updated answer. Of coarse there are and the solution would depend on your usecase. Simple page load, visibility of element and interactablility are handled through distinct approaches. – undetected Selenium Jul 05 '20 at 12:41