0

I'm using html_dom to scrape a website.

$url = $_POST["textfield"];
$html = file_get_html($url);

html_dom.php

function file_get_html($url, $use_include_path = false, $context=null, $offset = -1, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
{
    // We DO force the tags to be terminated.
    $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText);
    // For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done.
    $contents = file_get_contents($url, $use_include_path, $context);
    // Paperg - use our own mechanism for getting the contents as we want to control the timeout.
    //$contents = retrieve_url_contents($url);
    if (empty($contents) || strlen($contents) > MAX_FILE_SIZE)
    {
        return false;
    }
    // The second parameter can force the selectors to all be lowercase.
    $dom->load($contents, $lowercase, $stripRN);
    return $dom;
}

The problem is if the internet connection is too slow it is still going for the file_get_html then it will be a warning error saying failed to open stream and fatal error: 30 seconds max execution time. I tried to solve it by stoping the function if it detected a Warning error:

function errHandle($errNo, $errStr, $errFile, $errLine) {
    $msg = "Slow Internet Connection";
    if ($errNo == E_NOTICE || $errNo == E_WARNING) {
        throw new ErrorException($msg, $errNo);
    } else {
        echo $msg;
    }
}

set_error_handler('errHandle');

But it still printing the fatal error on execution time. Any idea on how i can solve this?

  • Your title is slightly misleading; what you're really asking for is how to handle the error condition when a GET times out. I don't do enough PHP to know how to handle this without significant research, but in C# we'd simply wrap the external call in a try..catch block. A quick Bing search netted this example: https://stackoverflow.com/a/17549618/73680 . – azarc3 Feb 11 '18 at 17:06
  • Try curl or guzzle. – pguardiario Feb 12 '18 at 01:13

1 Answers1

0

If it takes to long you could increase the time limit:

http://php.net/manual/en/function.set-time-limit.php

You can't catch a fatal in php 5.6 or below. In php 7+ you can with

try {
   doSomething();
} catch (\Throwable $exception) {
   // error handling
   echo $exception->getMessage();
}

Not sure if you can catch the execution time limit though.

John
  • 1,095
  • 3
  • 15
  • 31