0

I'm using python/selenium/headless geckodriver to scrape a page, but how can I get the unaltered html as it was downloaded before JS started manipulating the elements? This is what I've tried:

fireFoxOptions = webdriver.FirefoxOptions()
fireFoxOptions.headless = True
driver = webdriver.Firefox(options=fireFoxOptions)
driver.get(url)
print(driver.page_source)
Jonas Byström
  • 25,316
  • 23
  • 100
  • 147
  • This link may help - https://stackoverflow.com/questions/38301993/how-to-disable-java-script-in-chrome-driver-selenium-python – Swaroop Humane Aug 08 '20 at 12:50

1 Answers1

0

This seems to be the way to do it:

profile = webdriver.FirefoxProfile()
profile.DEFAULT_PREFERENCES['frozen']['javascript.enabled'] = False
profile.set_preference("app.update.auto", False)
profile.set_preference("app.update.enabled", False)
profile.update_preferences()
options = webdriver.FirefoxOptions()
options.profile = profile
options.headless = True
driver = webdriver.Firefox(options=options)
url = 'https://www.somewhere.com/some/path'
driver.get(url)
Jonas Byström
  • 25,316
  • 23
  • 100
  • 147