0

How can I check if a given URL refers to a webpage or a raw file? For now, I check the whole file for containing the string <html> but that is neither effective nor reliable.

$content = file_get_contents($url);
if($content)
{
    // is directory
    if(strrpos($content, "<html>"))
    {
        echo $url . " is a folder." . "<br>";
    }
    else // use raw file...
}
else echo $url . " was not found." . "<br>";
danijar
  • 32,406
  • 45
  • 166
  • 297

2 Answers2

3

You could get the headers and check for the content-type header. If it contains text/html, it's a HTML file.

See Fetch HTTP response header/redirect status with PHP

This won't be 100% reliable though - in rare cases, it could happen that the server doesn't send a content-typeheader.

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
0

Data coming from an URL can be anything, a disk-based file, a data stream generated on-the-fly, database query result, etc. Even the content-type header can be set to anything if the owner of the url is playful or evil (for example setting the content-type to text/html and serving a couple gigabytes of random text).

Beside that, you code is far from the optimal solution, think about a url serving a file having a size of 1 gigabyte of data: your server will suffer. Better solution to use CURL extension of the PHP: send a HEAD query to the url to discover the properties then download the content to a disk file and examine it later (using mime type detectors or any other solution). Be warning: even using CURL does not protect you from malicious URL's (like mentioned before).

NoNamed
  • 19
  • 2