Im having a problem retrieving json data using multi curl
from several url generated from database, If I limit the query from 100 to 500 links the issue will not occur, but when the links reaches 1000+, im starting to get random NULL return from curl_multi_getcontent
.
The multi curl function:
function curlMultiExec($nodes)
{
$node_count = count($nodes);
$ch_arr = array();
$master = curl_multi_init();
for($i = 0; $i < $node_count; $i++)
{
$url = $nodes[$i]['url'];
$ch_arr[$i] = curl_init($url);
curl_setopt($ch_arr[$i], CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch_arr[$i], CURLOPT_BINARYTRANSFER, TRUE);
curl_setopt($ch_arr[$i], CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch_arr[$i], CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch_arr[$i], CURLOPT_HEADER, FALSE);
curl_multi_add_handle($master, $ch_arr[$i]);
}
$running = null;
do
{
curl_multi_exec($master,$running);
} while( $running > 0 );
$obj = array();
for($i = 0; $i < $node_count; $i++ )
{
$item = array(
'url' => $nodes[$i]['url'],
'content' => curl_multi_getcontent($ch_arr[$i])
);
array_push($obj, $item);
}
curl_multi_close($master);
return $obj;
}
Currently the $nodes[$i]['url']
contains 1,912 url.
The output using print_r
Array
(
[0] => Array
(
[url] => http://api.worldbank.org/countries/AFG/indicators/NY.GDP.MKTP.CD?per_page=100&date=1960:2014&format=json
[content] => [{ /* json data */ }]
)
[1] => Array
(
[url] => http://api.worldbank.org/countries/ALB/indicators/NY.GDP.MKTP.CD?per_page=100&date=1960:2014&format=json
[content] => // -> here's the sample null value
)
.
. //-> and some random [content] here also contains null value
.
.
[1191] => Array
(
[url] => http://api.worldbank.org/countries/ZWE/indicators/NY.GDP.MKTP.CD?per_page=100&date=1960:2014&format=json
[content] => [{ /* json data */ }]
)
)
Kindly enlightened me why it returns random null value, Or what causes this behavior or is there a better approach than this?
UPDATE (2014-02-20); I found the solution here : curl_multi() without blocking
The problem is that most implementations of curl_multi wait for each set of requests to complete before processing them. If there are too many requests to process at once, they usually get broken into groups that are then processed one at a time.
The solution is to process each request as soon as it completes. This eliminates the wasted CPU cycles from busy waiting.
Based on his approach, I successfully solve this. Maybe this post can help, just in case someone stumble the same issue.
Cheers!