0

Im working in a script that get an screenshot of a website every day. I already did it for other sites and it worked correctly but for the first time i have the next problem... my phantomjs script capture almost all the data in the website, but not all (in fact it doesn't print the most important for my case).

Until now i was using this simple script adapted:

var page = require('webpage').create();
page.open('http://www.website.com', function() {
    setTimeout(function() {
        page.render('render.png');
        phantom.exit();
    }, 200);
});

But when i run the same script for this site its losing some data. Take the screenshot but miss the prices...

Screenshot of the site with phantomjs

After exploring a bit i saw that if i make a DOM capture (for example using a PHP Simple HTML DOM parser) i can get most of the data but not the prices.

$html = file_get_html('https://www.falabella.com.ar/falabella-ar/category/cat10178/TV-LED-y-Smart-TV');
$prods = $html->find('div[class=fb-pod-group__item]');
  foreach ($prods as $prod) {
    // For example i can get the title 
    $title = $prod->find('h4[class=fb-responsive-hdng-5 fb-pod__product-title]',0)->plaintext;

    // But not the price
    $price = $prod->find('h4[class=fb-price]',0)->plaintext;
  }

Exploring the console log i found the javascript objects where these values are. If i return the object fbra_browseProductListConfig.state.searchItemList.resultList[0].prices[0].originalPrice; i see the price of the first product and so on and so on...:

Console log of the site

also i can get it with a phantomjs script like this:

var page = require("webpage").create();
page.open("https://www.falabella.com.ar/falabella-ar/category/cat10122/Cafeteras-express", function(status) {
        var price = page.evaluate(function() {
        return fbra_browseProductListConfig.state.searchItemList.resultList[0].prices[0].originalPrice;
        });
        console.log("The price is " + price);
  phantom.exit();

});

In other posts (like this) i read about changing the timeout intervals but its not working for me (i tried all the scripts shared in the quoted post). The problem is not that the website doesn't fully charge. But it seems that this data (the prices) is not printed in the DOM. I even downloaded the full site from the terminal with wget command and the prices are not there o_O.

Edited

When i execute the script i get the next errors:

./phantomjs fala.js 
ReferenceError: Can't find variable: Set
  https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
  https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1 in t
  https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
  https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1 in t
  https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
  https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1 in t
  https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1
TypeError: undefined is not an object (evaluating 't.componentDomId')

  https://www.falabella.com.ar/static/assets/scripts/react/productListApp.js?vid=111111111:3
  https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22

Maybe the problem is there because the script "productListApp.js" executes the prices?

Farid Murzone
  • 136
  • 1
  • 7
  • Take a look at the second answer here: https://stackoverflow.com/questions/11340038/phantomjs-not-waiting-for-full-page-load I had a similar issue awhile back when i was converting our PDF engine to use PhantomJS and this was basically the solution I implemented and haven't had any problems with content not loading. – Adam H Jul 12 '18 at 21:54
  • I tried, but in the "console.log(htmlContent);" the prices do not appear in the HTML: https://imgur.com/a/Z7R4qmC I highlighted the part where the "

    $ X price p>" is supposed to be found. Only see the prices as javascript objects: https://imgur.com/wwpkptS (what of course does not come out in the render)

    – Farid Murzone Jul 12 '18 at 22:16
  • I'm not sure what's the question: do you want prices or a screenshot? What PhantomJS version are you using? Here's [a screenshot made with v2.1.1](https://i.imgur.com/keegYtH.jpg) Also to avoid getting mobile version you could set viewport settings like `page.viewportSize = { width: 1280, height: 800 };` – Vaviloff Jul 13 '18 at 04:22
  • @Vaviloff That's weird xD... Yes, i want the prices and i have PhantomJS 2.1.1 (tried on Linux and Mac computers). What script did you use? in what operating system? did you modified HTTP request headers or something? Anyway, if it works for you it is a light of hope for me XD. I'll keep trying. – Farid Murzone Jul 13 '18 at 13:00
  • I used PhantomJS v2.1.1 on Windows 7x64. For `Can't find variable: Set` see this solution: https://stackoverflow.com/a/38471938/2715393 – Vaviloff Jul 14 '18 at 07:22
  • 1
    @Vaviloff Thanks a lot, you're a genious. I think the problem was that the PhantomJS doesn't support ES6. Just a note if you want to edit your response in the other thread, the link to the core.js file is broken now.... i used [this one](https://raw.githubusercontent.com/zloirock/core-js/v2.5.7/client/core.js) – Farid Murzone Jul 15 '18 at 03:51

0 Answers0