0

Subject says it all. I need to start a stream of a website and stop it when e.g. </head> is found. I would like to do it to preserve bandwidth on both ends and to save script running time.

I don't want to download the whole page content to a memory; I need a stream of content coming in blocks, in PHP.

Thank you community, I love you guys :)

Edi Budimilic
  • 4,526
  • 3
  • 19
  • 22

1 Answers1

1
<?php

function streamUntilStringFound($url, $string, $timeout = 30){

    // remove the protocol - prevent the errors
    $url = parse_url($url);
    unset($url['scheme']);
    $url = implode("", $url);

    // start the stream
    $fp = @fsockopen($url, 80, $errno, $errstr, $timeout);
    if (!$fp) {
        $buffer = "Invalid URL!"; // use $errstr to show the exact error
    } else {
        $out  = "GET / HTTP/1.1\r\n";
        $out .= "Host: $url\r\n";
        $out .= "Connection: Close\r\n\r\n";
        fwrite($fp, $out);
        $buffer = "";
        while (!feof($fp)) {
            $buffer .= fgets($fp, 128);
            // string found - stop downloading any new content
            if (strpos(strtolower($buffer), $string) !== false) break;
        }
        fclose($fp);
    }

    return $buffer;

}

// download all content until closing </head> is found
$content = streamUntilStringFound("whoapi.com", "</head>");

// show us what is found
echo "<pre>".htmlspecialchars($content);

?>

Important note: (thanks to @GordonM)

allow_url_fopen needs to be enabled in php.ini to use fsockopen().

Edi Budimilic
  • 4,526
  • 3
  • 19
  • 22
  • 1
    looks reasonable, though you might want to use curl instead because I don't think this method would work if allow_url_fopen is disabled. – GordonM Sep 05 '12 at 09:11
  • cURL is also good; here's an answer I found using cURL to do that: http://stackoverflow.com/questions/1342583/php-manipulate-a-string-that-that-is-30-mil-chars-long/1342760#1342760 The only problem I found in that example is the bigger CPU load (but not much noticable on rare requests). – Edi Budimilic Sep 05 '12 at 09:18