0

I'm developing a tool to analyze the website given by the user. One of the important parts is to show the loading time of the website. How can I implement this in PHP? I tried the following method

Method 1:

Download HTML code of the website parses it, find each resource like CSS files, JavaScript files, images etc. Then download one by one.

Problem: Since real browsers like Chrome, they download around 6 resources at a time. PHP can't do asynchronously. Difficult to implement cache.

Method 2:

Using Apache's Bench mark tool. Seems pretty well. I can set concurrent connections and everything. Excellent tool. Can even enable Gzip.

ab -n 100 -c 10 http://www.google.com/

Problem: How I can enable cache? Because I want to test the website two times (to show the loading time with cache). I also heard Apache AB test doesn't download resources. Anyone know?

Is there any other method? Or is there any way to fix the problem of method ?

halfer
  • 19,824
  • 17
  • 99
  • 186
Gijo Varghese
  • 11,264
  • 22
  • 73
  • 122

3 Answers3

2

Instead of ab, you might try wget. It is useful for download an entire page, and it may use If-Modified-Since header if caching is enabled.
- https://www.gnu.org/software/wget/

Another idea would be to use Selenium WebDriver which allows you to control web browsers from PHP.
- https://github.com/facebook/php-webdriver

[Later Edit]

I'm afraid you cannot perform concurrent downloads (a page & its resources) using wget.

Even if you could, it would be very difficult to make it act as a real browser.

For example, a page may load 7 JavaScript files: three on the same domain, three from another domain and the 7th from a CDN. Some of these script files, upon execution, might load other resources - CSS files, images, other JavaScript libraries. Each CSS file might also trigger loading of other resources (font files, images, other style-sheets).

Measuring the loading time of a webpage in the above scenario becomes difficult, unless you're using a browser controller/emulator.

So, I would suggest taking a look on Selenium Driver. Or other libraries/tools like Mink, or PhantomJS - as @halfer mentioned.

tachirei
  • 56
  • 2
  • Will wget load resources too? Like css js files, images etc? – Gijo Varghese May 01 '16 at 11:01
  • Yes, it should. Please see this answer: http://stackoverflow.com/a/6510193/6277548 and its comments. Also, some examples from wikipedia: https://en.wikipedia.org/wiki/Wget#Using_Wget In other word you may try with `wget -H -p -k -U "Firefox User Agent" "http://your.web/page.html"` – tachirei May 01 '16 at 11:29
  • But if a website/webpage requires a full browser (with CSS and JavaScript capabilities) to load all its resources, perhaps `Selenium Webdriver` is a better solution. – tachirei May 01 '16 at 11:33
  • But wget is downloading each file one by one. Can we set concurrent downloads to 6 or something? – Gijo Varghese May 01 '16 at 16:11
2

If you want to do this in PHP, you have a couple of options:

  • Hook up to PhantomJS via a queue, like Gearman or Beanstalk. Phantom contains a real browser (the WebKit engine) so will load websites in a real-world fashion. There are drivers for PHP, such as Spiderling.
  • Parse the page using something like Goutte and then load its resources in parallel using multi_curl (PHP can do this!) or a wrapper around the same such as Guzzle. However, since this approach won't run JavaScript, extra loads that are triggered in code will not run.
halfer
  • 19,824
  • 17
  • 99
  • 186
1

Php can do do multiple request at the same time using CURL via the "multi" interface http://php.net/manual/en/function.curl-multi-init.php.

You can also pass the If-Modified-Since (see How to test for "If-Modified-Since" HTTP Header support) and your 2 point are answered.

But you still have several point to consider:

How do you know if you have to load conditional Js in php alone ?

How do the remote backend work with "fresh" cookie, or with old cookie ? Maybe the website has some special logic for returning user...

Of course the simulation in the web browser is the one who mimic more closely the real world, well actually it is not even mimicked.

But check if "handling" an external process who in turn control a web browser can result in alteration of the timing....

Community
  • 1
  • 1
Roy
  • 322
  • 3
  • 6