I am scraping a website that is using React components, using PhantomJS in Nodejs.
With this: https://github.com/amir20/phantomjs-node
Here is the code:
phantom.create().then(ph => {
_ph = ph;
return _ph.createPage();
}).then(page => {
_page = page;
return _page.open(url);
}).then(status => {
return _page.property('content');
}).then(content => {
console.log(content);
_page.close();
_ph.exit();
}).catch(e => console.log(e));
Problem is the react content is not rendered, it only says: <!-- react-empty: 1 -->"
where the actual react component should be loaded.
How can I scrap the rendered react component? I initially switched from a pure node-request solution to PhantomJS to fix this but now I am stuck.
UPDATE:
So I dont have a real solution yet. I switched to NightmareJS (https://github.com/segmentio/nightmare) which has a nice .wait('.some-selector')
function, which waits till the specified selector is loaded. This fixed my problems with dynamically loaded react components.