-1

As you know you can get all requests a website makes using Chrome Developer tools or Firebug. like this:

get GET requests using Chrome dev tools

I need to get this information using PHP, what method should I use? thanks.

  • What do you mean? You need to get the information about all the requests done to your PHP server? – Tivie Feb 15 '15 at 14:12
  • No, I need the exact same information Chrome Dev tools show here on any given website, just using PHP. That means the requests made by this website, not to my server. – Michael Abramishvili Feb 15 '15 at 14:13
  • To explain further, I need something like a function to which I pass the URL and it returns the array of requests made by this URL. – Michael Abramishvili Feb 15 '15 at 14:18
  • So you want a PHP script that downloads a webpage (for instance, www.google.com) and checks which resources should be fetched by that address? – Tivie Feb 15 '15 at 14:19
  • Yes, I need same information you get when you check Network tab in Chrome dev tools, but I need it with PHP. It should be possible but I can't find similar function, or question. – Michael Abramishvili Feb 15 '15 at 14:24
  • With JUST php you can't. Why do you need that information in PHP (if you don't mind me asking?). Because there's probably a better solution for your problem. – Tivie Feb 15 '15 at 14:55

1 Answers1

2

Short answer:

With ONLY PHP, you can't. (well, you can, but you would have to code "browser engine").

Long answer:

Requesting the address

Using php, you can make requests to an address and download the response using cURL or even file_get_contents (provided it is allowed in your php.ini). For instance:

$body = file_get_contents('http://www.google.com');
var_dump($body);

$body contains the response body of 'http://www.google.com', which, in this case, is an HTML file.

However, URLs sometime answer with something different than an HTML file (can be XML, json, plaintext, etc...)

cURL lets you fetch and check the response headers, which you can use to discover the content type of the response. Check this SO post for further details.

Some headers might 'point' to other resources as well, which mean you will need to parse the headers properly too.

Parsing the response body

Now you would need to parse the response, respecting the response content-type header. If it's a json or plain-text, then you're good to go since, as far as I know, those type of files cannot make further requests.

But let's assume it's the normal, regular, plain HTML. You can use DOMDocument to parse the HTML.

$doc = new DOMDocument();
$doc->loadHTML($body);

However, you will probably need to supress errors or validate and fix the html source first, since DOMDocument is very prone to choke with malformed HTML documents.

Traversing the response body

You will need to traverse the HTML Document and look for the HTML 'tags' that request resources. For instance, image tags, script tags, object tags, etc...

This will probably involve a lot of coding.

AJAX, the pitfall

However, even after all this work, there's still a problem. Modern pages make extensive use of Asynchronous requests (take angular based pages, for instance).

In order to grab those Async Requests, you will need to create a javascript parser and interpreter in PHP (which is insane) or rely on a third party tool (for instance, you can pass the data nodejs to run your javascript).

Community
  • 1
  • 1
Tivie
  • 18,864
  • 5
  • 58
  • 77
  • 1
    This question didn't deserve the care and effort you've put into answering it. I suspect, from the question, that the OP doesn't even understand that PHP gets executed serverside and therefore can't see the requests made by a page after it is served, and that he probably still doesn't realise this after reading your answer. This is above his level. – Mark Amery Feb 15 '15 at 18:16