2

I'm using curl_multi to make asynchronous requests: http://php.net/manual/en/function.curl-multi-init.php

The script sends request to all given URL's which is a bit to fast for what I'm doing. Is there a way to slow down the request rate?

Youss
  • 4,196
  • 12
  • 55
  • 109
  • 1
    Is cURL'ing one at a time too slow? [`curl_init()`](http://php.net/manual/en/function.curl-init.php) – MonkeyZeus Jan 06 '16 at 21:01
  • @MonkeyZeus Yes too slow. I would like to have more control by adding some time interval – Youss Jan 06 '16 at 21:03
  • I see a close..? Please explain, dont just hit and run – Youss Jan 06 '16 at 21:06
  • 1
    I highly suggest updating your question with more detail. I would include a fake list of URLs and explain at what point you need a specific URL to be fired after the other. I was going to suggest that you multi_curl some of the requests and regular curl the "time-sensitive" ones. As it stands, [`curl_pause()`](http://php.net/manual/en/function.curl-pause.php) might be the only thing that *might* help you to achieve your goal but it's documentation is severely lacking. – MonkeyZeus Jan 06 '16 at 21:10
  • 1
    You can also look into forking your processes and giving each fork a dedicated URL + sleep time. [`pcntl_fork()`](http://php.net/manual/en/function.pcntl-fork.php) – MonkeyZeus Jan 06 '16 at 21:14
  • I was thinking about curl_pause but I dont know how to use it, as you stated, no docs. Im also not sure about the 'details' of my question, at this point I would accept anything thats points in the right direction, since I have been googling and trying out stuff for hours. But Im starting to think there is no way to do this – Youss Jan 06 '16 at 21:15
  • 1
    Forking is suggested here: http://stackoverflow.com/questions/6987404/php-curl-multi-exec-delay-between-requests – MonkeyZeus Jan 06 '16 at 21:17
  • @MonkeyZeus Thank you very much, Forking looks very promising, might be the only thing out there – Youss Jan 06 '16 at 21:23
  • do you have control over the urls you are requesting? – I wrestled a bear once. Jan 06 '16 at 21:25
  • @Pamblam Yes I put the URL's in a simple array – Youss Jan 06 '16 at 21:27
  • i mean, you have control over the code on the requested pages? – I wrestled a bear once. Jan 06 '16 at 21:29
  • @Pamblam No I dont have control on the requested pages – Youss Jan 06 '16 at 21:30

2 Answers2

4
function asyncCurl($url){
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
  curl_setopt($ch, CURLOPT_TIMEOUT, 1);
  curl_exec($ch);
  curl_close($ch);
}

$timeout = 3; // in seconds
$urls = array(...);

foreach($urls as $url){
  asyncCurl($url);
  sleep($timeout);
}

If you need to get the response, it can still be done by creating a "background process" type of thing on your server. This will require 2 scripts instead of one.

background.php

function curl($url){
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  $a = curl_exec($ch);
  curl_close($ch);
  return $a;
}

$response = curl($_GET['url']);

// code here to handle the response

doRequest.php (or whatever, this is the one you will call in your browser)

function asyncCurl($url){
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, "mydomain.com/background.php?url=".urlencode($url));
  curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
  curl_setopt($ch, CURLOPT_TIMEOUT, 1);
  curl_exec($ch);
  curl_close($ch);
}

$timeout = 3; // in seconds
$urls = array(...);

foreach($urls as $url){
  asyncCurl($url);
  sleep($timeout);
}

The idea here is that PHP is single threaded, but there is no reason you can't have more than one PHP process running at the same time. The only downside is that you have to make the requests on one script and handle the response on another.


Option 3: display the output as soon as it becomes available.

This method is exactly the same as the one above, except that it uses javascript to create a new php process. You didn't have javascript tagged, but this is the only way to accomplish both

  • asynchronous requests w/ a timeout

and

  • display the response as soon as it's available

    doRequest.php

    <?php
    
    $urls = array(); // fill with your urls
    $timeout = 3; // in seconds
    
    if (isset($_GET['url'])) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $_GET['url']);
        $a = curl_exec($ch);
        curl_close($ch);
        echo $a;
        exit;
    }
    
    ?><html>
    <body>
        <div id='results'></div>
        <script>
    
            var urls = <?php echo json_encode($urls); ?>;
            var currentIndex = 0;
    
            function doRequest(url) {
                var xhttp = new XMLHttpRequest();
                xhttp.onreadystatechange = function () {
                    if (xhttp.readyState == 4 && xhttp.status == 200) {
                        document.getElementById("results").insertAdjacentHTML("beforeend", "<hr>" + xhttp.responseText);
                    }
                };
                xhttp.open("GET", "doRequest.php?url=" + encodeURIComponent(url), true);
                xhttp.send();
            }
    
            var index=0;
            function startLoop(){
                var url = urls[index];
                doRequest(url);
                setTimeout(function(){
                    index++;
                    if('undefined' != urls[index]) startLoop();
                }, <?php echo $timeout*1000; ?>);
            }
    
            startLoop();
        </script>
    </body>
    

What's happening is you server is creating a new request for each url and then using normal curl to get the response, but instead of using curl to create the new process, we use ajax which is async by nature and has the ability to create multiple PHP processes and wait for the response.

Godspeed!

I wrestled a bear once.
  • 22,983
  • 19
  • 69
  • 116
  • Thanks but if a website takes 20 seconds to respond, I have to wait 20 seconds. Thats why I use multi_curl, so I dont have to wait – Youss Jan 06 '16 at 21:39
  • notice how i put "async" in the function title. it's async. it doesn't wait. – I wrestled a bear once. Jan 06 '16 at 21:40
  • in other words, it makes the request then immediately closes the connection so the server can do whatever.. then it sleeps, then request another one. – I wrestled a bear once. Jan 06 '16 at 21:42
  • this assumes you don't need the response of the request.. if you do, there's one other thing you can do, let me know. – I wrestled a bear once. Jan 06 '16 at 21:43
  • Well actually I do need the response (the entire content). What else can I try? – Youss Jan 06 '16 at 21:47
  • This looks like the same but you moved the code to another page...So background.php is curling a large amount of data (20 seconds) but doRequest.php is timing out on 1 second. – Youss Jan 06 '16 at 22:06
  • doRequest creates a curl call to background.php and then closes the connection immediately, waits 20 seconds and then does it again, each time it passes the next url to background.php. – I wrestled a bear once. Jan 06 '16 at 22:08
  • background.php is waiting for the response and handles the response, doing it this way, you can fire background.php without waiting for the last response. each call to background.php essentially creates a background process. – I wrestled a bear once. Jan 06 '16 at 22:09
  • If you do curlopt_timeout, 1 sec there is no time to retrieve the content of background page – Youss Jan 06 '16 at 22:26
  • Exactly thats why you create a background process to wait for it. Just try it youll see.. make sure you put the correct url instead of "mydomain.com" – I wrestled a bear once. Jan 06 '16 at 22:31
  • So I want to echo response. What page should I use? – Youss Jan 06 '16 at 22:39
  • You cant echo the response, its a background script. Maybe you could save me some time and tell me exactly what you want to do.. – I wrestled a bear once. Jan 06 '16 at 22:44
  • The idea of using a bg script is so u can parse rhe response and put it in a database or something. If you need to dosplay the results immediatley there is yet another way.. – I wrestled a bear once. Jan 06 '16 at 22:46
  • Yes I would like to echo the response right away if possible – Youss Jan 06 '16 at 22:49
  • ok this is gona be my last edit though so i hope it's what you're looking for. one sec.. – I wrestled a bear once. Jan 06 '16 at 22:50
  • Thank you very much for sticking around, I really appreciate your work. unfortunately I cant use javascript in my project. But I really like your php worker script, something I have never done before and which I definitly can use for other stuff in my project. So thanks again – Youss Jan 06 '16 at 23:30
3

I did this some time ago but I cannot find the code now.

But basically you cannot stop the curl_multi_exec() looping so instead I wrapped that processing in another loop that just gave it lets say 2 curl handles to use and 20 of the 2000 urls to process.

Once that is completed you then set the next 20 urls for it to process and run the curl_multi_exec() function again, but you can put the sleep in this loop

A bit vague I know, but hopefully it will give you a starter for 10

I made the number of curl handles configurable by changing a define and the number of urls to pass into the curl_multi_exec() loop configurable that same way and that made it really quite easy to tune the processing to suit the situation.

RiggsFolly
  • 93,638
  • 21
  • 103
  • 149
  • Thanks for the effort but I was hoping for time intervals between individual URL request. Id rather not send two ore more requests at the same time, I would prefer half a second in between. Im actually stunned there is no curl function to handle this. – Youss Jan 06 '16 at 21:22
  • Well the basic premise of curl_multi_exec() is fire and forget, i assume you are hitting the same site with lots of requests. – RiggsFolly Jan 06 '16 at 21:23
  • Yes same website, and getting banned each time:) – Youss Jan 06 '16 at 21:25
  • I also dont like the idea of hammering down on any server – Youss Jan 06 '16 at 21:26
  • @Youss You should certainly add this little detail to your question: "Yes same website, and **getting banned each time**" – MonkeyZeus Jan 06 '16 at 21:27
  • Getting banned is not the issue, its something I experienced while testing my code in the wild. – Youss Jan 06 '16 at 21:29
  • Well in that case you are limited to doing it one at a time and sleeping as long as you like between normal curl's. There is no benefit to using multi – RiggsFolly Jan 06 '16 at 21:32
  • The benefit is that you dont have to wait for one curl process to finish before sending another one – Youss Jan 06 '16 at 21:34
  • And the problem is _that you dont have to wait for one curl process to finish before sending another one_ – RiggsFolly Jan 06 '16 at 21:35
  • @RiggsFolly Hypothetically, if he is allowed to request once per second but a single curl takes 5 seconds then he wants to send a curl every second even if the previous did not finish. curl_multi_init() sends all of the requests at once so it's an insta-ban – MonkeyZeus Jan 06 '16 at 21:36
  • @MonkeyZeus Yes I know its a quandry is it not – RiggsFolly Jan 06 '16 at 21:38
  • @Youss I think you should try forking and if you get stuck then people will be much more willing to help since forking is not often seen on SO :-) – MonkeyZeus Jan 06 '16 at 21:43