Im working in a script that get an screenshot of a website every day. I already did it for other sites and it worked correctly but for the first time i have the next problem... my phantomjs script capture almost all the data in the website, but not all (in fact it doesn't print the most important for my case).
Until now i was using this simple script adapted:
var page = require('webpage').create();
page.open('http://www.website.com', function() {
setTimeout(function() {
page.render('render.png');
phantom.exit();
}, 200);
});
But when i run the same script for this site its losing some data. Take the screenshot but miss the prices...
Screenshot of the site with phantomjs
After exploring a bit i saw that if i make a DOM capture (for example using a PHP Simple HTML DOM parser) i can get most of the data but not the prices.
$html = file_get_html('https://www.falabella.com.ar/falabella-ar/category/cat10178/TV-LED-y-Smart-TV');
$prods = $html->find('div[class=fb-pod-group__item]');
foreach ($prods as $prod) {
// For example i can get the title
$title = $prod->find('h4[class=fb-responsive-hdng-5 fb-pod__product-title]',0)->plaintext;
// But not the price
$price = $prod->find('h4[class=fb-price]',0)->plaintext;
}
Exploring the console log i found the javascript objects where these values are. If i return the object fbra_browseProductListConfig.state.searchItemList.resultList[0].prices[0].originalPrice; i see the price of the first product and so on and so on...:
also i can get it with a phantomjs script like this:
var page = require("webpage").create();
page.open("https://www.falabella.com.ar/falabella-ar/category/cat10122/Cafeteras-express", function(status) {
var price = page.evaluate(function() {
return fbra_browseProductListConfig.state.searchItemList.resultList[0].prices[0].originalPrice;
});
console.log("The price is " + price);
phantom.exit();
});
In other posts (like this) i read about changing the timeout intervals but its not working for me (i tried all the scripts shared in the quoted post). The problem is not that the website doesn't fully charge. But it seems that this data (the prices) is not printed in the DOM. I even downloaded the full site from the terminal with wget command and the prices are not there o_O.
Edited
When i execute the script i get the next errors:
./phantomjs fala.js
ReferenceError: Can't find variable: Set
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1 in t
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1 in t
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1 in t
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1
TypeError: undefined is not an object (evaluating 't.componentDomId')
https://www.falabella.com.ar/static/assets/scripts/react/productListApp.js?vid=111111111:3
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
Maybe the problem is there because the script "productListApp.js" executes the prices?
$ X price p>" is supposed to be found. Only see the prices as javascript objects: https://imgur.com/wwpkptS (what of course does not come out in the render)
– Farid Murzone Jul 12 '18 at 22:16