Get unaltered html via selenium

Question

I'm using python/selenium/headless geckodriver to scrape a page, but how can I get the unaltered html as it was downloaded before JS started manipulating the elements? This is what I've tried:

fireFoxOptions = webdriver.FirefoxOptions()
fireFoxOptions.headless = True
driver = webdriver.Firefox(options=fireFoxOptions)
driver.get(url)
print(driver.page_source)

This link may help - https://stackoverflow.com/questions/38301993/how-to-disable-java-script-in-chrome-driver-selenium-python — Swaroop Humane, Aug 08 '20 at 12:50

score 0 · Accepted Answer · answered Oct 21 '20 at 07:18

This seems to be the way to do it:

profile = webdriver.FirefoxProfile()
profile.DEFAULT_PREFERENCES['frozen']['javascript.enabled'] = False
profile.set_preference("app.update.auto", False)
profile.set_preference("app.update.enabled", False)
profile.update_preferences()
options = webdriver.FirefoxOptions()
options.profile = profile
options.headless = True
driver = webdriver.Firefox(options=options)
url = 'https://www.somewhere.com/some/path'
driver.get(url)

Get unaltered html via selenium

1 Answers1