Selenium page_source does not return modified DOM tree

Question

I want to figure out the changes before and after applying an addon like NoScript/ghostery to a certain webpage. NoScript/ghostery blocks trackers' and advertisers' scripts and remove them from DOM tree (as an example I checked it 'http://a.visualrevenue.com/vrs.js' while surfed cnn.com before and after 'enabling' NoScript in Firefox). However, 'http://a.visualrevenue.com/vrs.js' is still there if I dump DOM tree using selenium's browser.get_source. I am using the following code in the process:

import pickle
from selenium import webdriver

fp = webdriver.FirefoxProfile(../<extension/addons/>)
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://www.cnn.com")
html_source = browser.page_source
f = open("cnn.p", "wb")
pickle.dump(html_source, f)

selenium's get_source get source documentations says that it gets modified (in my case modified by NoScript) DOM tree but I couldn't figured out that if this happens. I would appreciate if anyone could comment on how to get modified (by an addon) DOM tree using selenium or any automated tool.

edit: line # 3 was replaced with more generic format – imkhan Oct 28 '14 at 09:25 — imkhan, Oct 28 '14 at 09:25

score 1 · Accepted Answer · answered Oct 28 '14 at 09:20

1

After tried several ways, finally solved my problem. Instead of using webdriver.page_source (outputs 'html source'), I used webdriver.execute_script("return document.documentElement.outerHTML") to dump rendered HTML.

answered Oct 28 '14 at 09:20

imkhan

171
4
16

Selenium page_source does not return modified DOM tree

1 Answers1

Linked