4

I'm trying to make broken link checker with php. I modified some php code i found online i'm not php programmer. It let's in some unbroken link's but thats ok. However I have problem with all presentation, zips and so on... Basicly if it is downlaod then algorithm thinks it's a dead link.

<?php
    set_time_limit(0);
    //ini_set('memory_limit','512M');
    $servername = "localhost";
    $username   = "";
    $password   = "";

    try {
        $conn = new PDO("mysql:host=$servername;dbname=test", $username, $password);
        // set the PDO error mode to exception
        $conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
        echo "Connected successfully" . "<br />";
        echo "----------------------------------------------------<br />";
    }
    catch (PDOException $e) {
        echo "Connection failed: " . $e->getMessage();
    }

    $sql    = "SELECT object,value FROM metadata where xpath = 'lom/technical/location'";
    $result = $conn->query($sql)->fetchAll(PDO::FETCH_ASSOC);
    //print_r($result);

    $array_length = sizeof($result); //26373
    //$array_length = 26373;
    $i            = 0;

    $myfile = fopen("Lom_Link_patikra1.csv", "w") or die("Unable to open file!");
    $menu_juosta = "Objektas;Nuoroda;Klaidos kodas;\n";
    //fwrite($myfile,$menu_juosta);

    for ($i; $i < $array_length; $i++) {
        $new_id           = $result[$i]["object"];
        $sql1             = "SELECT published from objects where id ='$new_id'";
        $result_published = $conn->query($sql1)->fetchAll(PDO::FETCH_ASSOC);
        //print_r ($result_published);                 

        if ($result_published[0]["published"] != 0) {
            $var1             = $result[$i]["value"];
            $var1             = str_replace('|experience|902', '', $var1);
            $var1             = str_replace('|packed_in|897', '', $var1);
            $var1             = str_replace('|packed_in|911', '', $var1);
            $var1             = str_replace('|packed_in|895', '', $var1);
            $request_response = check_url($var1); // Puslapio atsakymas

            if ($request_response != 200) {
                $my_object = $result[$i]["object"] . ";" . $var1 . ";" . $request_response . ";\n";
                fwrite($myfile, $my_object);
            }
        }
    }
    fclose($myfile);
    $conn = null;

    function check_url($url)
    {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, 1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $data    = curl_exec($ch);
        $headers = curl_getinfo($ch);
        curl_close($ch);
        return $headers['http_code'];
    }

Link example : http://lom.emokykla.lt/MO/Matematika/pazintis_su_erdviniais%20_kunais_1.doc

Any solutions, advice?

Thank you all for help.Now it works way faster. It seems there is problem with blank spaces, but that's even intriguing.

As it seems the problem i had was in understanding, how http status is working, like what it return's and why. Link's that i had marked as bad,but working where 301 or 302 - Redirect's. https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

Thank you all for help.

Exmorn
  • 71
  • 1
  • 9

3 Answers3

3

Using CURL for remote file

function checkRemoteFile($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
// don't download content
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
if(curl_exec($ch)!==FALSE)
{
    return true;
}
else
{
    return false;
}
}

EDIT: I may have misunderstood you but if you just want to check if the url actually exists than the code below will be all you need.

function url_exists($url) {
if(@file_get_contents($url,0,NULL,0,1))
{return 1;}
else
{return 0;}
}
  • Thank you, checking it now. Atleast it works better than mine. Will see the result. – Exmorn Nov 24 '15 at 10:25
  • Works, better but still counts linke that are for file as dead one = false. http://lom.emokykla.lt/MO/Matematika/pazintis_su_erdviniais%20_kunais_1.doc – Exmorn Nov 24 '15 at 10:38
  • The file in the link you supplied does exists. I downloaded it to verify it but I did not open. If the file exists but is not valid or empty than you can still use the above code to check it, download it and then you will have to verify its integrity with another function or script. –  Nov 24 '15 at 17:36
  • I edited the answer. I may have misunderstood what you are trying to achieve but I hope it helps. –  Nov 24 '15 at 17:42
1

curlopt_nobody set to TRUE makes a HTTP HEAD request instead of a GET request, so try using curl_setopt( $ch, CURLOPT_NOBODY, true );

Amit Joshi
  • 361
  • 1
  • 8
0

Try to use file_exists method : http://php.net/manual/fr/function.file-exists.php

Thomas Rollet
  • 1,573
  • 4
  • 19
  • 33