11

I've been banging my head against the wall for hours trying to understand why cURL's cookie jar file was empty when I tried reading it. I just discovered that my code works if I call curl_close() twice instead of once, however, and I'm wondering if this is a bug with cURL.

Here's an example:

curl_close($chInfo['handle']);
var_dump(is_resource($chInfo['handle']));

That outputs boolean true. So, in other words, the handle isn't closed, despite the fact that I called curl_close().

My next thought was that maybe it takes some time for the handle to be closed, so I tried using sleep() for a few seconds after the curl_close() call, but there wasn't any difference.

Out of desperation, I tried copying the curl_close() line, like this:

curl_close($chInfo['handle']);
curl_close($chInfo['handle']);
var_dump(is_resource($chInfo['handle']));

That outputs boolean false, meaning the handle is closed, and I am able to read from the cookie jar file (cURL writes the cookies to the file when the handle is closed).

So what's going on here? This seems an awful lot like a bug!

EDIT: I can't post my full code (you wouldn't want to read it anyway!), but here is a simplified example (note that only one url is fetched in this example, whereas in my real code curl_multi is utilized to fetch many URLs simultaneously):

$curlOptions = array(
    CURLOPT_USERAGENT      => 'Mozilla/5.001 (windows; U; NT4.0; en-US; rv:1.0) Gecko/25250101',
    CURLOPT_CONNECTTIMEOUT => 5, // the number of seconds to wait while trying to connect.
    CURLOPT_TIMEOUT        => 5, // the maximum number of seconds to allow cURL functions to execute.
    CURLOPT_RETURNTRANSFER => 1, // TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.
    CURLOPT_FOLLOWLOCATION => 1,
    CURLOPT_MAXREDIRS      => 10,
    CURLOPT_AUTOREFERER    => 1,
    CURLOPT_REFERER        => null,
    CURLOPT_POST           => 0,  // GET request by default
    CURLOPT_POSTFIELDS     => '', // no POST data by default
    CURLINFO_HEADER_OUT    => 1, // allows the request header to be retrieved
    CURLOPT_HEADER         => 1, // returns the response header along with the page body
    CURLOPT_URL            => 'http://www.example.com/',
    CURLOPT_COOKIEJAR      => __DIR__ . '/cookie.txt',
    CURLOPT_COOKIEFILE     => __DIR__ . '/cookie.txt'
);


$ch = curl_init();
curl_setopt_array($ch, $curlOptions); // set the options for this handle

$mh = curl_multi_init();
$responses = array();
curl_multi_add_handle($mh, $ch); // add the handle to the curl_multi object

do
{
    $result   = curl_multi_exec($mh, $running);
    $activity = curl_multi_select($mh);    // blocks until there's activity on the curl_multi connection (in which case it returns a number > 0), or until 1 sec has passed

    while($chInfo = curl_multi_info_read($mh))
    {
        $chStatus = curl_getinfo($chInfo['handle']);

        if($chStatus['http_code'] == 200) // if the page was retrieved successfully
        {
            $response = curl_multi_getcontent($chInfo['handle']); // get the response

            curl_multi_remove_handle($mh, $chInfo['handle']); // remove the curl handle that was just completed
            curl_close($chInfo['handle']);                    // close the curl handle that was just completed (cookies are saved when the handle is closed?)
            curl_close($chInfo['handle']);

            var_dump(is_resource($chInfo['handle']));
        }
        else // request failed
        {
            echo 'Error: Request failed with http_code: ' . $chStatus['http_code'] . ', curl error: ' . curl_error($chInfo['handle']). PHP_EOL;
        }
    }
} while ($running > 0);

curl_multi_close($mh);

If you run the above code, the output will be

boolean false

Indicating that the handle is closed. However, if you remove the second call to curl_close(), then the output changes to

boolean true

Indicating the handle is not closed.

Nate
  • 26,164
  • 34
  • 130
  • 214
  • 4
    This is really strange. I never had such an issue and I'm working with cURL quite often. What is your PHP version? Are you okay with sharing the cURL execution? – sunshinejr Mar 31 '14 at 00:35
  • @ailvenge I'm using PHP version 5.4.12. I posted example code for you. Thanks for your help. – Nate Apr 06 '14 at 21:16
  • 2
    I am curious why you use curl_close instead curl_multi_close? On every other place you use curl_multi_..., i think it is logical to use curl_multi_close too. This is just a tougth.. – bksi Apr 06 '14 at 21:23
  • 1
    @bksi I use `curl_close()` to close a handle after it has successfully retrieved a page. I use `curl_multi_close()` after all of the handles have finished processing. The example code just fetches one URL, but my real code fetches many URLs simultaneously using `curl_multi`. – Nate Apr 06 '14 at 21:25
  • 1
    This part of code i see, is showing me that you use same $ch to do the requests. I'm not sure why you use it this way. Here is the simple example of using curl_multi_add_handle: http://se2.php.net/manual/en/function.curl-multi-close.php They use different curl instances to add handlers. – bksi Apr 06 '14 at 21:29
  • @bksi This is just a very simplified example. In my real code I have an array of cURL handles, I don't use the same one for multiple request (each request has it's own handle. once a request is completed, then I close the handle. once all the requests are completed, I call `curl_multi_close()`). – Nate Apr 06 '14 at 21:32
  • 1
    @bksi I can't call `curl_multi_close()` until *all* of the cURL requests have finished processing. I call `curl_multi_remove_handle()` and `curl_close()` for the individual cURL handles once they have finished processing, then I call `curl_multi_close()` once *all* of the handles have finished processing. – Nate Apr 06 '14 at 21:36
  • But you don't use $ch to close the instance as is in the example. curl_multi_add_handle($mh, $ch); curl_multi_remove_handle($mh, $ch); then you should use curl_close($mh, $ch); – bksi Apr 06 '14 at 21:39
  • 1
    @bksi `$chInfo['handle']`, where `$chInfo = curl_multi_info_read($mh)`, is equal to `$ch`. – Nate Apr 06 '14 at 21:42
  • But you use same variable to init the curl. As i said in my first post, i don't think this is good idea (else if you know that they don;t use shared memory, or memory pointers inside the curl stuff). I would use $ch[] instead $ch. – bksi Apr 06 '14 at 21:44
  • 1
    @bksi In my full code I have a `for` loop that has `$ch = curl_init();` then more code and then `curl_multi_add_handle($mh, $ch)`. Is that not valid (if so, thanks for pointing it out!)? In either case, with the example I posted here, only one handle is being created and the problem still exists. – Nate Apr 06 '14 at 21:49
  • I'm wondering if this is actually a bug in PHP's curl. Looking at the PHP source code, I'm getting the impression that there's some reference counting going awry, but I can't be sure. Might be worth actually raising this as an issue with PHP, and seeing what the maintainers say. – Matt Gibson Apr 11 '14 at 07:08

4 Answers4

5

This is not realy a bug, but just the way it works. If you look at the source code you can see what is happening.

At first you open the handle with $ch = curl_init(); and looking at the source in ext\curl\interface.c you can see that internally it sets ch->uses = 0;

Then you call curl_multi_add_handle($mh, $ch); and looking at ext\curl\multi.c this method does ch->uses++;. At this point ch->uses==1

Now the last part, looking at curl_close($chInfo['handle']);, again in ext\curl\interface.c it has the following code:

if (ch->uses) {
    ch->uses--;
} else {
    zend_list_delete(Z_LVAL_P(zid));
}

So the first attempt to close it will decrease ch->uses and the second attempt it will actually close it.

This internal pointer only increases when using curl_multi_add_handle or when using curl_copy_handle. So I guess the idea was for curl_multi_add_handle to use a copy of the handle and not the actual handle.

Hugo Delsing
  • 13,803
  • 5
  • 45
  • 72
  • 1
    Very interesting. Thank you for delving into the source code to figure out the cause behind this odd behavior! It seems strange to me that the developers of cURL would make it that way. The fact that the cURL handle **must** be closed in order for the cookie jar/file to be read makes me feel like this is a bug, because when using multi_curl that means you **have** to close the handle twice. Thanks for getting to the bottom of it! – Nate Apr 14 '14 at 01:13
  • 1
    Well the oposite of `curl_multi_add_handle` seems to be `curl_multi_remove_handle` and that does `--ch->uses`. It looks like they expected somebody to add the handle to multi curl to be processed and after it was removed they could still access the handle. So instead of `close;close;` you could also use `remove;close` and then it seems perfectly logical to me. Anyway, thanks for the rep :) – Hugo Delsing Apr 14 '14 at 06:37
0

Here is no issue. When using multi-curl you don't need to call curl_close. Instead, you have to call curl_multi_remove_handle on each used handle. So the curl_close call(s) in your code is redundant.

See examples of proper multi-curl flow here: 1, 2.

hindmost
  • 7,125
  • 3
  • 27
  • 39
  • According to the documentation, the cookie jar file is not updated until `curl_close()` is called. Through my own testing, this seems to be the case. As mentioned in my question, I have to call it twice before the cookie jar file can be read, so clearly there is an issue. – Nate Apr 11 '14 at 15:17
  • @Nate Are you using one cookie jar file for all curl requests? – hindmost Apr 11 '14 at 15:22
  • No, I'm using a separate file for each handle. – Nate Apr 12 '14 at 02:46
  • I have worked with `multi-curl` many times and I never experienced any problem with saving/loading cookies. See my example of `multi-curl` implementation here (sorry for the plug): https://github.com/hindmost/rolling-curl-mini – hindmost Apr 13 '14 at 19:24
-1

The 'handle' is not closed in the loop after the loop you can remove the handles

    curl_multi_remove_handle($mh, $ch1);
    /* this is not suppose to be required but the remove sometimes fails to close the connection */
    curl_close($ch1); 
    curl_multi_remove_handle($mh, $ch2);
    curl_close($ch2);

if you set up your connections as an array you can remove them through a separate loop after the main loop.

    /* init and add connection */
    foreach ($multi_urls as $i => $url) 
    {
        $ch[$i] = curl_init($url);
        curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 1);
        curl_multi_add_handle ($mh, $ch[$i]);
    }

    main loop {
        ....
    }

    /* remove and close connection */
    foreach($ch AS $i => $conn)
    { 
       curl_multi_remove_handle($mh, $ch[$i]);
       curl_close($ch[$i]);
    }
Baine Sumpin
  • 142
  • 5
-3

I think there is only 1 mistake after looking into the code i.e.

while($chInfo = curl_multi_info_read($mh))

change with

while($chInfo == curl_multi_info_read($mh))
Vineet1982
  • 7,730
  • 4
  • 32
  • 67
  • 2
    This is wrong. The OP's code is correct: he assigns the result of the call to a variable and checks whether the variable is NULL. If the call returned a non-NULL value, the loop continues. – Aleks G Apr 07 '14 at 14:33