2

The page I'm trying to scrape is http://zipatlas.com/us/oh/zip-code-comparison/population-below-poverty-level.1.htm

It loads some content through javascript, so I'm trying to use the expected_conditions module in selenium to detect it. What happens is that I apparently detect the element I'm looking for, but when I print the page source, it doesn't contain that element. There's a link labeled "TEST LINK" at the bottom of the page, so I figured if that has loaded, the rest of the page pretty much has also.

Here is my code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException


curr_url = r"http://zipatlas.com/us/oh/zip-code-comparison/population-below-poverty-level.1.htm"

driver = webdriver.Firefox()

driver.get(curr_url)
try: 
    myElem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.LINK_TEXT, 'TEST LINK')))
except TimeoutException:
    print("took too long to load")
print("element detected")
elem = driver.find_element_by_link_text('TEST LINK')
html = elem.get_attribute("outerHTML")
print(html)
print(driver.page_source)
driver.close()

I do successfully print out the detected element as <a href="">TEST LINK</a>

However, in the page_source that is printed out, I cannot find this. The page source is located here. I also tried using other expected_conditions like element_to_be_clickable

So my question is why is the located element not appearing in the page source? Also, is there any other way to detect that the whole page has loaded? Using expected_conditions is really the only potential solution I found.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
fooiey
  • 1,040
  • 10
  • 23

1 Answers1

0

You were close. Before you exract the outerHTML of the WebElement you need to induce WebDriverWait.

You can use the following solution:

  • Code Block:

    driver.get('http://zipatlas.com/us/oh/zip-code-comparison/population-below-poverty-level.1.htm')
    print(WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.LINK_TEXT, 'TEST LINK'))).get_attribute("outerHTML"))
    print("==========")
    print(driver.page_source)
    
  • Console Output:

      <a href="">TEST LINK</a>
      ==========
      <html><head><title>
          Zip Codes with the Highest Percentage of Population Below Poverty Level in Ohio | Zip Atlas
      </title>
    
    
    
      <meta name="robots" content="all,index,follow"><meta name="rating" content="general"><meta name="author" content="ZipAtlas.com Development Team"><meta name="language" content="en-us"><meta name="copyright" content="Copyright 2011 ZipAtlas.com"><meta name="revisit-after" content="7 Days"><meta http-equiv="Expires" content="-1"><meta http-equiv="Distribution" content="Global"><meta http-equiv="Content-Type" content="text/html; charset=windows-1252"><meta name="google-site-verification" content="3cRw56ihbmZI3sma1cdmLLpkwcJEE_L1tUFYhaet2xQ">
    
          <style type="text/css">
              body, td, div, span, p { color: #333333; font-size: 12px; font-family: 'Segoe UI','Lucida Grande',Verdana,Arial,Helvetica,sans-serif; }
              select { color: #333333; font-size: 12px; font-family: 'Segoe UI','Lucida Grande',Verdana,Arial,Helvetica,sans-serif; border: solid 1px #5A81A6; }
              a { text-decoration: none; color: #0000D0; }
              a:hover { text-decoration: underline; color: #0000D0; }
              h1 { margin:0px 0px 10px 0px; padding:0px 0px 0px 0px; font-family: 'Segoe UI','Lucida Grande',Verdana,Arial,Helvetica,sans-serif; font-size: 16px; font-weight: normal; color: #3d7795;}
              h2 { margin:35px 0px 0px 0px; padding:0px 0px 0px 0px; font-family: 'Segoe UI','Lucida Grande',Verdana,Arial,Helvetica,sans-serif; font-size: 15px; font-weight: normal; color: #3d7795;}
              h3 { margin:35px 0px 0px 0px; padding:0px 0px 0px 0px; font-family: 'Segoe UI','Lucida Grande',Verdana,Arial,Helvetica,sans-serif; font-size: 14px; font-weight: normal; color: #3d7795;}
    
              span.link { cursor: pointer; text-decoration: none; font-size: 12px; font-family: 'Segoe UI','Lucida Grande',Verdana,Arial,Helvetica,sans-serif; color: #0000D0; }
              span.link:hover { cursor: pointer; text-decoration: underline; color: #0000D0; }
    
              td.report_header { border: solid 1px #5A81A6; background-color: #5A81A6; color: #ffffff; }
              td.report_data { border: solid 1px #5A81A6; padding: 1px 5px 1px 5px; font-size: 12px; }
          </style>
      <link rel="preload" href="https://adservice.google.co.in/adsid/integrator.js?domain=zipatlas.com" as="script"><script src="https://partner.googleadservices.com/gampad/cookie.js?domain=zipatlas.com&amp;callback=_gfp_s_&amp;client=ca-pub-7710991166856237"></script><script src="https://pagead2.googlesyndication.com/pagead/js/r20200624/r20190131/show_ads_impl_fy2019.js" id="google_shimpl"></script><script type="text/javascript" src="https://adservice.google.co.in/adsid/integrator.js?domain=zipatlas.com"></script><link rel="preload" href="https://adservice.google.com/adsid/integrator.js?domain=zipatlas.com" as="script"><script type="text/javascript" src="https://adservice.google.com/adsid/integrator.js?domain=zipatlas.com"></script><script src="https://www.google.com/cse/static/element/57975621473fd078/cse_element__en.js?usqp=CAI%3D" type="text/javascript"></script><link type="text/css" rel="stylesheet" href="https://www.google.com/cse/static/element/57975621473fd078/default_v2+en.css"><link type="text/css" rel="stylesheet" href="https://www.google.com/cse/static/style/look/v4/default.css"></head>
      <body style="margin:0px 0px 0px 0px; padding: 0px 0px 0px 0px; background: url('/images/bg.gif');">
    
          <table cellpadding="0" cellspacing="0" style="width:100%;">
              <tbody><tr>
                  <td style="background: url('/images/shadow-left.gif') top right repeat-y;" valign="top">
                      <table cellpadding="0" cellspacing="0" style="width:100%;height:200px; background: url('/images/bg-top-left.gif') top right no-repeat;">
                          <tbody><tr>
                              <td>&nbsp;</td>
                          </tr>
                      </tbody></table>
                  </td>
                  <td style="width:930px;background:url('/images/bg-top.gif') top left repeat-x;" valign="top">
                      <table cellpadding="0" cellspacing="0" style="width:100%;">
                          <tbody><tr>
                              <td>
                                  <table cellpadding="0" cellspacing="0" style="width:100%;">
                                      <tbody><tr>
                                          <td>
                                              <a href="/"><img border="0" src="/images/logo.gif" alt="ZipAtlas Home"></a>
                                          </td>
                                      </tr>
                                  </tbody></table>
                              </td>
                              <td align="right" valign="bottom" style="color: #c0c0c0; padding-bottom: 3px; font-size: 13px;">
                                  <a style="color: #ffffff;" href="/downloads/">Database Download</a>
                              </td>
                          </tr>
                      </tbody></table>
                      <table cellpadding="0" cellspacing="0" style="width:100%; background-color:#ffffff;">
                          <tbody><tr>
                              <td style="padding: 10px 10px 10px 10px; height:550px;" valign="top">
    
    
          <!--<form action="/" method="get">//-->
          <table cellpadding="0" cellspacing="0" style="width:100%; border-bottom: solid 1px #f0f5f9;">
              <tbody><tr>
                  <td><h1>Zip Codes with the Highest Percentage of Population Below Poverty Level in Ohio</h1></td>
                  <td align="right" valign="top">
                      <table cellpadding="0" cellspacing="0">
                          <tbody><tr>
                              <td><a href="javascript:void(window.open('http://www.facebook.com/sharer.php?u='+encodeURIComponent('http://zipatlas.com/us/oh/zip-code-comparison/population-below-poverty-level.1.htm'), 'pf','height=400,width=550').focus())"><img border="0" src="/images/social/facebook-s.gif"></a></td>
                              <td style="padding-left:1px;"><a href="javascript:void(window.open('http://twitter.com/home?status='+encodeURIComponent('http://zipatlas.com/us/oh/zip-code-comparison/population-below-poverty-level.1.htm')).focus())"><img border="0" src="/images/social/twitter-s.gif"></a></td>
                              <td style="padding-left:1px;"><a href="javascript:void(window.open('http://www.myspace.com/Modules/PostTo/Pages/?u='+encodeURIComponent('http://zipatlas.com/us/oh/zip-code-comparison/population-below-poverty-level.1.htm'),'pm','height=450,width=440').focus())"><img border="0" src="/images/social/myspace-s.gif"></a></td>
    
                              <!--<td style="padding-left:15px;"><input type="text" name="q" style="width:175px;" value="" /></td>
                              <td><input type="submit" value="Search" /></td>//-->
                          </tr>
                      </tbody></table>
                  </td>
              </tr>
          </tbody></table>
    
          <!--</form>//-->
    
          <table cellpadding="0" cellspacing="0" style="width:100%; border-bottom: solid 1px #f0f5f9;">
              <tbody><tr>
                  <td style="padding:15px 0px 10px 0px;" align="center">
              <script type="text/javascript" async="" src="https://cse.google.com/cse.js?cx=013012024412622983838:nucmfhluwdu"></script><script>
                  (function () {
                  var cx = '013012024412622983838:nucmfhluwdu';
                  var gcse = document.createElement('script');
                  gcse.type = 'text/javascript';
                  gcse.async = true;
                  gcse.src = 'https://cse.google.com/cse.js?cx=' + cx;
                  var s = document.getElementsByTagName('script')[0];
                  s.parentNode.insertBefore(gcse, s);
                  })();
              </script>
              <gcse:search></gcse:search>
                  </td>
              </tr>
          </tbody></table>
    
          <table cellpadding="0" cellspacing="0" style="width:100%; border-bottom: solid 1px #f0f5f9">
              <tbody><tr>
                  <td style="padding:5px 0px 5px 0px;" align="center">
                      <script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
                      <!-- ZipAtlas - 3 Across (Mixed) -->
                      <ins class="adsbygoogle" style="display:inline-block;width:300px;height:250px" data-ad-client="ca-pub-7710991166856237" data-ad-slot="2630863889" data-adsbygoogle-status="done"><ins id="aswift_0_expand" style="display:inline-table;border:none;height:250px;margin:0;padding:0;position:relative;visibility:visible;width:300px;background-color:transparent;"><ins id="aswift_0_anchor" style="display:block;border:none;height:250px;margin:0;padding:0;position:relative;visibility:visible;width:300px;background-color:transparent;"></ins></ins></ins>
                      <script>
                      (adsbygoogle = window.adsbygoogle || []).push({});
                      </script>
                  </td>
                  <td style="padding:5px 0px 5px 0px;" align="center">
                      <script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
                      <!-- ZipAtlas - 3 Across (Mixed) -->
                      <ins class="adsbygoogle" style="display:inline-block;width:300px;height:250px" data-ad-client="ca-pub-7710991166856237" data-ad-slot="2630863889" data-adsbygoogle-status="done"><ins id="aswift_1_expand" style="display:inline-table;border:none;height:250px;margin:0;padding:0;position:relative;visibility:visible;width:300px;background-color:transparent;"><ins id="aswift_1_anchor" style="display:block;border:none;height:250px;margin:0;padding:0;position:relative;visibility:visible;width:300px;background-color:transparent;"></ins></ins></ins>
                      <script>
                      (adsbygoogle = window.adsbygoogle || []).push({});
                      </script>
                  </td>
                  <td style="padding:5px 0px 5px 0px;" align="center">
                      <script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
                      <!-- ZipAtlas - 3 Across (Mixed) -->
                      <ins class="adsbygoogle" style="display:inline-block;width:300px;height:250px" data-ad-client="ca-pub-7710991166856237" data-ad-slot="2630863889" data-adsbygoogle-status="done"><ins id="aswift_2_expand" style="display:inline-table;border:none;height:250px;margin:0;padding:0;position:relative;visibility:visible;width:300px;background-color:transparent;"><ins id="aswift_2_anchor" style="display:block;border:none;height:250px;margin:0;padding:0;position:relative;visibility:visible;width:300px;background-color:transparent;"></ins></ins></ins>
                      <script>
                      (adsbygoogle = window.adsbygoogle || []).push({});
                      </script>
                  </td>
              </tr>
          </tbody></table>
    
          <div id="ctl00_ContentPlaceHolder1_final_content" style="padding-top:10px;">
                  <table cellpadding="0" cellspacing="0">
                      <tbody><tr>
                          <td style="padding-left:3px;">Ohio Report:</td>
                          <td style="padding-left:5px;">
                              <div style="border: solid 1px #5A81A6; cursor:pointer; padding: 1px 5px 1px 5px; background-color: #FFFFD0; color: #5A81A6;" onmouseover="this.style.backgroundColor='#5A81A6';this.style.color='#ffffff';" onmouseout="this.style.backgroundColor='#FFFFD0';this.style.color='#5A81A6';" onclick="onContextMenu(event);" title="Click to select a different Ohio report">
                                  Percentage of Population Below Poverty Level
                              </div>
                          </td>
                      </tr>
                  </tbody></table>
    
                  <td style="background: url('/images/shadow-right.gif') top left repeat-y;" valign="top">
                      <table cellpadding="0" cellspacing="0" style="width:100%;height:200px; background: url('/images/bg-top-right.gif') top left no-repeat;">
                          <tbody><tr>
                              <td>&nbsp;</td>
                          </tr>
                      </tbody></table>
                  </td>
              </tr>
              <tr>
                  <td align="right"><img src="/images/shadow-ll.gif"></td>
                  <td style="background: url('/images/edge-bottom.gif') top left repeat-x;">
                      <table cellpadding="0" cellspacing="0" style="width:100%;">
                          <tbody><tr><td><img src="/images/shadow-lr.gif"></td>
                          <td align="right"><img src="/images/shadow-rl.gif"></td>
                      </tr></tbody></table>
                  </td>
                  <td><img src="/images/shadow-rr.gif"></td>
              </tr>
          </tbody></table>
    
          <center>
              <div style="color:#c0c0c0; padding: 50px 0px 50px 0px;">
                  <a style="color: #ffffff;" href="/">Zip Atlas Home</a> |
                  <a style="color: #ffffff;" href="/downloads/">Downloads</a> |
              <a style="color: #ffffff;" href="https://ecovinyl.ca">ecoVinyl</a> |
                  <a href="">TEST LINK</a>
    
                  <br><br>
    
                  <font color="#ffffff">© 2020 ZipAtlas.Com</font>
              </div>
          </center>
    
    
    
          <script type="text/javascript">
              function Set(el_name, c)
              {
                  var el = document.getElementById(el_name);
                  if (el)
                  {
                      el.innerHTML = c;
                  }
              }
              function Show(el_name)
              {
                  var el = document.getElementById(el_name);
                  if (el)
                  {
                      el.style.display = '';
                  }
              }
              function Hide(el_name)
              {
                  var el = document.getElementById(el_name);
                  if (el)
                  {
                      el.style.display = 'none';
                  }
              }
          </script>
    
      <!-- expo-MAX Code Start //-->
      <!-- Paste this code into every page that you would like to track //-->
      <script type="text/javascript">
          document.write(unescape('%3Cscript type="text/javascript" src="'+document.location.protocol+'//expo-max.com/adserver/js/"%3E%3C/script%3E'));
      </script><script type="text/javascript" src="http://expo-max.com/adserver/js/"></script>
      <script type="text/javascript">
          expomax_trace('WunfWYG%2bFajQ%2f9F4kqiaXg%3d%3d','cb959e484ba8457ca327aeefce4cb2b4');
      </script><div id="g5ef264d7a54140f7b03eff0ea8cfe256" style="display:none;"><iframe style="display:none;" src="https://expo-max.com/adserver/track/?e=WunfWYG%2bFajQ%2f9F4kqiaXg%3d%3d&amp;a=Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20Win64%3B%20x64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F83.0.4103.116%20Safari%2F537.36&amp;l=http%3A%2F%2Fzipatlas.com%2Fus%2Foh%2Fzip-code-comparison%2Fpopulation-below-poverty-level.1.htm&amp;r=&amp;w=1366&amp;h=768&amp;p=http:"></iframe></div>
      <!-- expo-MAX Code End //-->
    
    
    
      <ins class="adsbygoogle adsbygoogle-noablate" data-adsbygoogle-status="done" style="display: none !important;"><ins id="aswift_3_expand" style="display:inline-table;border:none;height:0px;margin:0;padding:0;position:relative;visibility:visible;width:0px;background-color:transparent;"><ins id="aswift_3_anchor" style="display:block;border:none;height:0px;margin:0;padding:0;position:relative;visibility:visible;width:0px;background-color:transparent;"></ins></ins></ins></body><iframe id="google_esf" name="google_esf" src="https://googleads.g.doubleclick.net/pagead/html/r20200624/r20190131/zrt_lookup.html#" data-ad-client="ca-pub-7710991166856237" style="display: none;"></iframe></html>
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thanks for the answer. Actually, I don't think it has anything to do with that, since I do call WebDriverWait in the try/except block. I ended up writing the driver.page_source to a file, and the full html appeared. Maybe it has something to do with the console output in visual studio code. – fooiey Jul 06 '20 at 15:50
  • @fooiey The [expected_conditions](https://stackoverflow.com/questions/59130200/selenium-wait-until-element-is-present-visible-and-interactable/59130336#59130336) of `presence_of_element_located()` and `element_to_be_clickable()` have a difference in their implementation. – undetected Selenium Jul 06 '20 at 17:39