1

I want to log in to this page with Selenium using Python. But the page displayed in the browser is different from page described in the HTML.Firefox or Chrome webdriver gets the same result.

chromedriver = "./chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)

# OR
#driver = webdriver.Firefox() 


driver.get('http://www.anb.org/login.htmlurl=%2Farticles%2Fhome.html&ip=94.112.189.79&nocookie=0')
# get screenshot of page
driver.get_screenshot_as_file('./01.png')

#get source code of page
print driver.page_source

I'm not allowed post the images, but the image is exactly the same as the page displayed in the web-browser.

HTML code from driver:

<html><head>
<title>American National Biography Online</title>
<script>
document.write ("<FRAMESET ROWS=\"103,*\" FRAMEBORDER=0 BORDER=0 FRAMESPACING=0>\n");
document.write ("  <FRAME SRC=\"top-home.html\" MARGINWIDTH=0 MARGINHEIGHT=0 SCROLLING=NO>\n");
if (location.search) {
  var url = unescape (location.search);
  url = (new String(url)).substring(1);
  if (url.indexOf ("&") == -1) {
    document.write ("  <FRAME SRC=\"" + url + "\" MARGINWIDTH=0 MARGINHEIGHT=0>\n");
  } else {
    document.write ("  <FRAME SRC=\"main-home.html" + location.search + "\" MARGINWIDTH=0 MARGINHEIGHT=0>\n");
  }
}
else
  document.write ("  <FRAME SRC=\"main-home.html\" NAME=atop MARGINWIDTH=0 MARGINHEIGHT=0>\n");
document.write ("</FRAMESET>\n");
</script></head>
<frameset rows="103,*" frameborder="0" border="0" framespacing="0">
  <frame src="top-home.html" marginwidth="0" marginheight="0" scrolling="NO">
  <frame src="main-home.html?url=%2Farticles%2Fbrowse.html&amp;ip=94.112.189.79&amp;nocookie=0" marginwidth="0" marginheight="0">
</frameset>

<noframes>
</noframes> 
</html>

As you can see, the HTML and the picture do not match.

Maybe problem is with frames?

My configuration:

osx 10.8.5
python 2.7.5
chrome browser 28.0.1500.71
firefox browser 24.0

I installed the lastest chrome/firefox webdrivers, but I really don't know how to find version.

Keresan
  • 35
  • 1
  • 7
  • 1
    I don't see this HTML as being wrong, it simply is HTML provided by server, before it's molested by javascript. – Tymoteusz Paul Oct 16 '13 at 08:49
  • 1
    possible duplicate of [How can I get html content written by JavaScript with Selenium/Python](http://stackoverflow.com/questions/16073626/how-can-i-get-html-content-written-by-javascript-with-selenium-python) – Tymoteusz Paul Oct 16 '13 at 08:49
  • That is exactly the issue and a reason why the HTML differs. It's not an unknown code, just the original one. – Tymoteusz Paul Oct 16 '13 at 09:06

1 Answers1

8

The property page_source is almost useless: It returns the first version of HTML that the server sent to the browser; it's not a copy of the current DOM.

The best way to get a copy is to use JavaScript and innerHTML:

js_code = "return document.getElementsByTagName('html').innerHTML"
your_elements = sel.execute_script(js_code)

Also note that innerHTML doesn't span frame elements. Since you have frames in your code, you need to examine those individually:

frames = driver.find_element_by_tag_name("frame")
js_code = "return arguments[0].innerHTML"
your_elements = sel.execute_script(js_code, frames[0])

You can also switch to a frame:

driver.switch_to_frame("frameName")

After that, all code will execute within the context of this frame. Don't forget to switch back.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • but this give me exactly the same html result. js_code = "return document.getElementsByTagName('html')" your_elements = driver.execute_script(js_code) source_code = your_elements[0].get_attribute("outerHTML") print source_code – Keresan Oct 16 '13 at 09:44
  • You did notice the frames in your code? Frames need special handling. See my edits. – Aaron Digulla Oct 16 '13 at 09:48