I'm not using Selenium to automate testing, but to automate saving AJAX pages that inject content, even if they require prior authentication to access.
I tried
tl;dr: I tried multiple tools for downloading sites with AJAX and gave up because they were hard to work with or simply didn't work. I'm resorting to using Selenium after trying out WebHTTrack (whose GUI wasn't able to start up on my Ubuntu machine + was a headache to provide authentication with in interactive-terminal mode), wget
(which didn't download any of the scripts of stylesheets included on my page, see the bottom for what I tried with wget)... and then I finally gave up after a promising post on using a Mozilla XULRunner AJAX scraper called Crowbar simply seg-faulted on me. So...
ended up making my own broken thing in NodeJS and Selenium-WebdriverJS
My NodeJS script uses selenium-webdriver npm module which is "officially supported by the main project" to:
- provide login information + do necessary button-clicking & typing for authentication
- download all JS and CSS referenced on target page
- download target page with original JS/CSS file links change to local file paths
Now when I view my test page locally I see double of many page elements because the target site loads HTML snippets into the page each time it's loaded. I use this to download my target page right now:
var $;
var getTarget = function () {
driver.getPageSource().then(function (source) {
$ = cheerio.load(source.toString());
});
};
var targetHtmlDest = 'test.html';
var writeTarget = function () {
fs.writeFile(targetHtmlDest, $.html());
}
driver.get(targetSite)
.then(authenticate)
.then(getRoot)
.then(downloadResources)
.then(writeRoot);
driver.quit();
The problem is that the page source I get is the already modified page source, instead of the original one. Trying to run alert("x");window.stop();
within driver.executeAsyncScript()
and driver.executeScript()
does nothing.