0

I was implementing a very simple table of links in a mySQL database with php and this idea came to my mind: to put a button "test" that runs a test in all links stored.

For example:

http://www.somebodysite.com/somesubdir/somefile.php?id=1

This is a very basic link, even so, a lot of errors might occurs:

  • www.somebodysite is not available anymore, they didn't pay the bill
  • somesubdir were deleted
  • somefile.php were renamed
  • id were removed from database

Many things can't be examined remotely, I guess, but others can. How far can I go? What link elements can I verify remotely?

Gustavo
  • 1,673
  • 4
  • 24
  • 39
  • 1
    Do an HTTP GET and check the status code of the return. – crush Feb 17 '14 at 22:09
  • Check if it returns a 404? – Marty Feb 17 '14 at 22:09
  • @crush: _“Do an HTTP GET”_ – replace `GET` with `HEAD` … – CBroe Feb 17 '14 at 22:10
  • possible duplicate of [Easy way to test a URL for 404 in PHP?](http://stackoverflow.com/questions/408405/easy-way-to-test-a-url-for-404-in-php) – Marty Feb 17 '14 at 22:10
  • I know certain Drupal modules like `Link Checker` & `Web Links` do this, some better than others. Most only do it for links to, from, & around your site, though, so unless you're Google it's not practical. Not sure how they do it, but point is it's possible. – trysis Feb 17 '14 at 22:16
  • Not exactly, but is good to mention. There is a redirection issue in that question. – Gustavo Feb 17 '14 at 22:16
  • So more like `Possibly related` or something like that? – trysis Feb 17 '14 at 22:21
  • For me, an answer like 302 or 307 will mean, most of the time, the link is right, it is just being redirected. – Gustavo Feb 18 '14 at 02:31

1 Answers1

3

For first three items you can use get_headers() function and check, if response code is HTTP/1.1 200 OK:

$response = get_headers('http://www.somebodysite.com/somesubdir/somefile.php?id=1');

$validCodes = array(
   'HTTP/1.1 200 OK',
   'HTTP/1.1 301 Moved Permanently',
   'HTTP/1.1 307 Temporary Redirect'
   // add more codes as you want
);

if (in_array($response[0], $validCodes))
{
   // It's ok
}
else
{
   // Something is wrong
}

But to check, if given id was removed from databse, you have to know how owner of somebodysite.com tells you that some item was deleted. If you know which string is presented on page with deleted item, just load it and look for it (stream_get_contents() can be helpful here). Really basic example, since I am not too familiar with regular expresions:

$stream = fopen('http://www.somebodysite.com/somesubdir/somefile.php?id=1', 'r');
$pageSource = stream_get_contents($stream);

$isDeletedString = 'removed from database';

$isDeleted = strpos($pageSource, $isDeletedString);

if ($isDeleted === false)
{
   // Still there
}
else
{
   // Item was deleted
}
Pavel Štěrba
  • 2,822
  • 2
  • 28
  • 50
  • 1
    How about statuses like 3xx, which are correct links but you're redirected, or even 5xx, which are correct but there's a server error? – trysis Feb 17 '14 at 22:17
  • Well, you can of course create array with correct status codes and use `in_array` to check it. – Pavel Štěrba Feb 17 '14 at 22:18
  • I think is a start. I'll finish my script and begin some tests. I might divide the query and test only the root, than the subdir, etc. – Gustavo Feb 17 '14 at 22:19
  • @Pavel: I think get_headers is the answer, instead of using curl as the another similar post. However, 200 is not the only good answer, above 400 means the link has a good chance to be right. So I think you should add this in your answer. And thanks! – Gustavo Feb 18 '14 at 02:26
  • 1
    I just updated my original post to be able to check for multiple statuses. You can easily add which you want. – Pavel Štěrba Feb 18 '14 at 06:38