0

I found a script to have asynchronous response from curl_multi requests:

function rolling_curl($urls, $callback, $custom_options = null) {

// make sure the rolling window isn't greater than the # of urls
$rolling_window = 100;
$rolling_window = (sizeof($urls) < $rolling_window) ? sizeof($urls) : $rolling_window;

$master = curl_multi_init();
$curl_arr = array();

// add additional curl options here
$std_options = array(CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 5);
$options = ($custom_options) ? ($std_options + $custom_options) : $std_options;
$hosts = array();
// start the first batch of requests
for ($i = 0; $i < $rolling_window; $i++) {
    $hosts[$i]=parse_url($urls[$i], PHP_URL_HOST);

    $ch = curl_init();
    $options[CURLOPT_URL] = $urls[$i];
    curl_setopt_array($ch,$options);
    curl_multi_add_handle($master, $ch);
}

do {
    while(($execrun = curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);
    if($execrun != CURLM_OK)
        break;
    // a request was just completed -- find out which one
    while($done = curl_multi_info_read($master)) {
        $info = curl_getinfo($done['handle']);

        if ($info['http_code'] == 200)  {
            $output = curl_multi_getcontent($done['handle']);

            // request successful.  process output using the callback function.
           $id = array_search(parse_url($info['url'], PHP_URL_HOST),$hosts);
            $callback($output,true, $id);

            // start a new request (it's important to do this before removing the old one)
            if(@$u = $urls[$i++]){
              $ch = curl_init();
            $options[CURLOPT_URL] = $u;  // increment i
            curl_setopt_array($ch,$options);
            curl_multi_add_handle($master, $ch);  
            }




            // remove the curl handle that just completed
            curl_multi_remove_handle($master, $done['handle']);
        } else {
            $callback("",false);
        }
    }
} while ($running);

curl_multi_close($master);
return true;
} 

This is already edited by me because what I was trying to do was keeping the index of the url array input on callback function. For example if input array is

$urls = array(0=> $url0, 1=> $url1, ...);

then I want that when curl request on url1 is completed, $callback is triggered with $id = 1 so that I know how to manage data returned, because I know it is url1. To do this I thought of finding the curl url in the urls array, but sometimes curl follow redirect, so it's not always working. Then I thought of comparing the host (fortunately every url has a different host in my case) and it works: I create a host array when creating curl handles and when request is completed I find the index of the curl host into the array. This is not the best way to get what I want, because with urls with same host it wouldn't work and maybe there is a tricky and more easy way to do that. Which is the best way to get the url index? PS: my callback function is

function callback($output,$success,$id){...}
Franz Tesca
  • 255
  • 1
  • 5
  • 19
  • Knowledge of the index is disassociated from the curl handle, so you'll have to reapproach this to make that possible. That said, have you tried along the lines of: `$url = $urls[$i]; curl_setopt($ch, CURLOPT_WRITEFUNCTION, function ($ch, $body) use ($url, $i) { callback($body, curl_errno($ch), $i); });` – bishop Jul 20 '16 at 15:59
  • @bishop I tried, but if I do that, html code in $body is cropped: I tried defining callback as echo strlen($output)."
    ".htmlentities($output); to compare the $callback method with CURLOPT_WRITEFUNCTION method and they are different, $callback return full page, while CURLOPT_WRITEFUNCTION return only a part of it.
    – Franz Tesca Jul 20 '16 at 17:13
  • Yes, I was hand waving a bit. The `WRITEFUNCTION` callback incrementally receives the returned data: maybe 0 bytes, maybe all of it, maybe somewhere in between. You should expect to have the callback called several times to fully accumulate the body. See http://stackoverflow.com/a/15958698/2908724 – bishop Jul 20 '16 at 18:15
  • [PR] [mpyw/co](https://github.com/mpyw/co): Asynchronous cURL executor simply based on resource and Generator. – mpyw Aug 09 '16 at 09:48

0 Answers0