49

I'm using selenium to click to the web page I want, and then parse the web page using Beautiful Soup.

Somebody has shown how to get inner HTML of an element in a Selenium WebDriver. Is there a way to get HTML of the whole page? Thanks

The sample code in Python (Based on the post above, the language seems to not matter too much):

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup


url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)

the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')
Ratmir Asanov
  • 6,237
  • 5
  • 26
  • 40
YJZ
  • 3,934
  • 11
  • 43
  • 67

4 Answers4

90

To get the HTML for the whole page:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://stackoverflow.com")

html = driver.page_source

To get the outer HTML (tag included):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.outerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')

To get the inner HTML (tag excluded):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.innerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')
Florent B.
  • 41,537
  • 7
  • 86
  • 101
  • 4
    thanks @florentbr. there seems to be a simpler answer for an element in the post mentioned in OP, `element.get_attribute('innerHTML')` ---- does your answer to the same thing, or which one is more powerful/flexible? – YJZ Mar 10 '16 at 01:07
2

driver.page_source probably outdated. Following worked for me

let html = await driver.getPageSource();

Reference: https://seleniumhq.github.io/selenium/docs/api/javascript/module/selenium-webdriver/ie_exports_Driver.html#getPageSource

karthikdivi
  • 3,466
  • 5
  • 27
  • 46
1

Using page object in Java:

    @FindBy(xpath = "xapth")
    private WebElement element;

    public String getInnnerHtml() {
        System.out.println(waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML"));
        return waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML")
    }
kohane15
  • 809
  • 12
  • 16
Anil Jain
  • 57
  • 3
0

A C# snippet for those of us who might want to copy / paste a bit of working code some day

var element = yourWebDriver.FindElement(By.TagName("html"));
string outerHTML = element.GetAttribute(nameof(outerHTML));

Thanks to those who answered before me. Anyone in the future who benefits from this snippet of C# that gets the HTML for any page element in a Selenium test, please consider up voting this answer or leaving a comment.

kohane15
  • 809
  • 12
  • 16
No Refunds No Returns
  • 8,092
  • 4
  • 32
  • 43