I'm trying to do some web scraping with node.js. Using jsdom
, it is easy to load up the DOM and inject JavaScript into it. I want to go one step further: run all JavaScript linked to from the web page and then inspect the resulting DOM, including visual properties (height, width, etc) of elements.
Thus far, I get NaN
when I try to inspect the dimensions of DOM elements with jsdom.
Is this possible?
It strikes me that there are two distinct challenges:
- Running all the JS on the web page
- Getting Node to simulate the window/screen rendering in addition to just the DOM
Another way to ask the question: is it possible to use node.js as a completely headless browser that you can script?
If this isn't possible, does anyone have suggestions for what library I can use to do this? I'm relatively language agnostic.