How to get innerHTML of whole page in selenium driver?

Question

I'm using selenium to click to the web page I want, and then parse the web page using Beautiful Soup.

Somebody has shown how to get inner HTML of an element in a Selenium WebDriver. Is there a way to get HTML of the whole page? Thanks

The sample code in Python (Based on the post above, the language seems to not matter too much):

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup


url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)

the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')

Florent B. · Accepted Answer · 2018-04-01T19:37:26.210

To get the HTML for the whole page:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://stackoverflow.com")

html = driver.page_source

To get the outer HTML (tag included):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.outerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')

To get the inner HTML (tag excluded):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.innerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')

thanks @florentbr. there seems to be a simpler answer for an element in the post mentioned in OP, `element.get_attribute('innerHTML')` ---- does your answer to the same thing, or which one is more powerful/flexible? — YJZ, Mar 10 '16 at 01:07

score 2 · Answer 2 · answered Oct 29 '19 at 18:07

2

driver.page_source probably outdated. Following worked for me

let html = await driver.getPageSource();

Reference: https://seleniumhq.github.io/selenium/docs/api/javascript/module/selenium-webdriver/ie_exports_Driver.html#getPageSource

answered Oct 29 '19 at 18:07

karthikdivi

3,466
5
27
46

4

For posterity, a short clarification that the answer above by @Florent B. is referring to the Python API, whereas this one is the JavaScript equivalent. – Jake Tae Aug 30 '20 at 13:26
How do you do this in Python? – user3286381 Jan 13 '21 at 01:55

score 1 · Answer 3 · edited Sep 09 '21 at 14:25

1

Using page object in Java:

    @FindBy(xpath = "xapth")
    private WebElement element;

    public String getInnnerHtml() {
        System.out.println(waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML"));
        return waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML")
    }

edited Sep 09 '21 at 14:25

kohane15

809
12
16

answered Aug 24 '17 at 09:16

Anil Jain

57
3

What is the benefit of going this way to click an element ? – sam Nov 09 '22 at 17:39

score 0 · Answer 4 · edited Sep 10 '21 at 15:45

A C# snippet for those of us who might want to copy / paste a bit of working code some day

var element = yourWebDriver.FindElement(By.TagName("html"));
string outerHTML = element.GetAttribute(nameof(outerHTML));

Thanks to those who answered before me. Anyone in the future who benefits from this snippet of C# that gets the HTML for any page element in a Selenium test, please consider up voting this answer or leaving a comment.

How to get innerHTML of whole page in selenium driver?

4 Answers4

Linked