7

I have a PHP-script that loads page-content from another website by using CURL and simple_html_dom PHP library. This works great. If I echo out the HTML returned I can see the div-content there.

However, if I try to select only that div with the simple_html_dom, the div always returned empty. At first I didn't know why. Now I know that it's because its content apparently is populated with javascript/ajax.

How would I get the content of the site and then be able to select the div-content AFTER the javascript has populated it with the correct content?

Is it even possible? Thanks!

Felipe Sabino
  • 17,825
  • 6
  • 78
  • 112
Tanax
  • 433
  • 2
  • 10
  • 20
  • Hi @Tanax , it is possible and it should be working, because no matter how some content is populated but if it is loaded inside the HTML even after loading via AJAX, it is normally selectable, you can select it by document.getElementById('divid'), and it will be accessible, if you can share your code here or provide the URL then it would be a lot easier for us to point out the problem. – Hafiz Nov 06 '11 at 09:40
  • What are you trying to do ? I believe the approach you have taken might be broken, so going to back to what you want to do is the best way to ask a question at this moment. Other than that, if you want fully normal behaviour and include, just – Morg. Sep 22 '11 at 10:06

5 Answers5

2

Yes its piece of cake if you are interested only in that particular html which is returned by ajax.

  1. Gather information like url, parameters and request type (post/get) from that ajax request.
  2. Generate the same request from your php/curl code and you got it.
  3. And hope that server logic will not check who sent the request.
Imran Naqvi
  • 2,202
  • 5
  • 26
  • 53
1

For this kind of screen scraping you could try phpQuery or Snoopy.

phpQuery has a web browser plugin and scoopy claims to simulate one

Felipe Sabino
  • 17,825
  • 6
  • 78
  • 112
1

you can always bind to the event that is fired when the xhr returns data to the browser and do your operations there.

 var xhReq = createXMLHttpRequest();
 xhReq.open("GET", "ur_php_url.php");
 xhReq.onreadystatechange = onResponse;
 xhReq.send(null);

 function onResponse()
 {
 // do the necessary
 }
Baz1nga
  • 15,485
  • 3
  • 35
  • 61
1

Yes, it is possible.

What you need to do is the following:

  1. Create a CURL call to that webpage in order to retrieve any parameter used in the Ajax call that loads the content, which you are looking for.
  2. Create another CURL call to the file called by that webpage Javascript using the parameters that you have gotten using step number 1.

ex. Say you want to get the content of http://www.domain.com/page.html and this page.html retrieves some other data using Ajax, say $("#div").load("http://www.domain.com/ajax/data.php?time=48484&c=487387").

What you will do is to make a CURL request to page.html first, and get the full URL of the Ajax call using preg_match() PHP function or any equivalent function in any other language. After that, create another CURL request to that URL - http://www.domain.com/ajax/data.php?time=48484&c=487387 - and get its content.

You're all set!

0

Unfortunately Javascript is run client-side, in a browser, so unless the page is loaded in a web browser there is no simple way to do it.

The only way I can think of, is having a browser running in a server’s background, reloading and saving the generated page automatically in a file which will be available for a PHP script to fetch. Well... I don’t know about anyone who has implemented such an idea.

Better try to get the URL where the div is being populated from. If the div contents are generated through AJAX for example, maybe if you fetch the data-origin URL with cURL, the data will be available for you as well.