I'm a Java developer, but have played around with javascript very little. I am looking to develop a small Node JS app to parse a dynamic web page... so we need some way to wait until the page is fully loaded. I managed to get a node js project running with a hello world app.
I then updated the project to support PhantomJS via the PhantomJS Node bridge (https://github.com/amir20/phantomjs-node). I was able to successfully run one of their (PhantomJS Node bridge) samples in my node project (see below). While this will successfully write the contents of the web page to a file, the content is not complete, as it does not contain the dynamic data (retrieved via javascript/AJAX).
Can someone tell me a code modification to the below that will allow it to wait until the page is fully loaded prior to writing the file?
** Edit - Just saw where another user has basically the exact same issue, but is unanswered: Dynamic scraping using nodejs and phantomjs
Node js version 6.20, phantom js version 2.1.1, phantom js node (bridge) version 2.1.2
var phantom = require('phantom');
var sitepage = null;
var phInstance = null;
phantom.create()
.then(instance => {
phInstance = instance;
return instance.createPage();
})
.then(page => {
sitepage = page;
return page.open('http://www.somesite.com');
})
.then(status => {
console.log(status);
return sitepage.property('content');
})
.then(content => {
var fs = require('fs');
fs.writeFile("output.html", content, function(err) {
if(err) {
return console.log(err);
}
});
sitepage.close();
phInstance.exit();
})
.catch(error => {
console.log(error);
phInstance.exit();
});