0

Please let me know is it possible to scrap some info after ajax loaded with PHP? I had only used SIMPLE_HTML_DOM for static pages.

Thanks for advice.

Tetsu
  • 71
  • 2
  • 10
  • Are you having your server make ajax requests, or are you running PHP in a client? Perhaps if you showed us the code you have already we could help find the problem. – Joe Sep 17 '15 at 05:38
  • I'm running on PHP client. As now i just only want to know how do the tricks ;] – Tetsu Sep 17 '15 at 05:56

2 Answers2

3

Scraping the entire site

Scraping Dynamic content requires you to actually render the page. A PHP server-side scraper will just do a simple file_get_contents or similar. Most server based scrappers wont render the entire site and therefore don't load the dynamic content generated by the Ajax calls.

Something like Selenium should do the trick. Quick google search found numerous examples on how to set it up. Here is one

Scraping JUST the Ajax calls

Though I wouldn't consider this scraping you can always examine an ajax call by using your browsers dev tools. In chrome while on the site hit F12 to open up the dev tools console.

enter image description here

You should then see a window like the above. Hit the network tab and then hit chrome's refresh button. This will show every request made between you and the site. You can then filter out specific requests.

For example if you are interested in Ajax calls you can select XHR enter image description here

You can then click on any of the listed items in the tabled section to get more information.

File get content on AJAX call Depending on how robust the APIs are on these ajax calls you could do something like the following.

<?php 
$url = "http://www.example.com/test.php?ajax=call";
$content = file_get_contents($url);
?>

If the return is JSON then add

$data = json_decode($content);

However, you are going to have to do this for each AJAX request on a site. Beyond that you are going to have to use a solution similar to the ones presented [here].

Finally you can also implement PhantomJS to render an entire site.

Summary

If all you want is the data returned by specific ajax calls you might be able to get them using file_get_contents. However, if you are trying to scrape the entire site that happens to also use AJAX to manipulate the document then you will NOT be able to use SIMPLE_HTML_DOM.

Community
  • 1
  • 1
ductiletoaster
  • 483
  • 2
  • 9
  • I already know this workaround, but what i need is to get data from some site directly from PHP code. – Tetsu Sep 17 '15 at 05:59
  • What do you mean get directly from PHP code? Are you wanting to actually get the sites PHP source code? – ductiletoaster Sep 17 '15 at 15:22
  • I'm wondering is it possible using similar way as class SIMPLE_HTML_DOM. – Tetsu Sep 17 '15 at 15:36
  • @Tetsu I updated my answer to show you how you might get the data. BUT without clarification it is hard for me to know what you are trying to do. – ductiletoaster Sep 17 '15 at 16:17
  • Put simply I want to scrap content from this site: – Tetsu Sep 22 '15 at 10:38
  • @Tetsu I already answer your question. Scraping an entire site that relies heavily on ajax generated content is not an easy task. I gave you some options and work arounds but essentially none of them are going to be simple. – ductiletoaster Sep 22 '15 at 13:22
0

Finally I worked around my problem. I just get a POST url with all parameters from ajax call and make the same request using SIMPLE_HTML_DOM class.

Tetsu
  • 71
  • 2
  • 10