1

I'm developing a php crawler and i can get all of link's href in page. i don't want to save url of file download link in my database, such as

http://www.example.com/folder1/thefile.exe

http://www.example.com/folder1/download.php?id=1

http://www.example.com/folder1/thefile.zip

http://www.example.com/folder1/thefile.extension

or any other extension.

This is my valid function and i know is_file() function is useless here.

protected function isValid($url)
{
    $isJavascript = strpos(strtolower($url), 'javascript:') !== false; // remove javascript links 
    $isEmail = strpos(strtolower($url), 'mailto:')!==false; // remove mailto links

    if($isEmail || $isJavascript)
        return false;


    if(is_file($url)){
        echo "is file<br>";
      return false;
    } else echo "is not file<br>";


    if (strpos($url, $this->_host) === false
        || $this->isSeen($url)
    ) {
        return false;
    }

    return true;
}

Now my question is: how can i detect any url that cause a file download?

Manian Rezaee
  • 1,012
  • 12
  • 26
  • The answer is simple: you can't, not until you've actually opened the url and checked whether the result is binary or text. And even then, it's not fool-proof at all. – thomasb Aug 11 '15 at 13:17

0 Answers0