143

So some guy at some other company thought it would be awesome if instead of using soap or xml-rpc or rest or any other reasonable communication protocol he just embedded all of his response as cookies in the header.

I need to pull these cookies out as hopefully an array from this curl response. If I have to waste a bunch of my life writing a parser for this I will be very unhappy.

Does anyone know how this can simply be done, preferably without writing anything to a file?

I will be very grateful if anyone can help me out with this.

thirsty93
  • 2,602
  • 6
  • 26
  • 26

8 Answers8

196
$ch = curl_init('http://www.google.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// get headers too with this line
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
// get cookie
// multi-cookie variant contributed by @Combuster in comments
preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $result, $matches);
$cookies = array();
foreach($matches[1] as $item) {
    parse_str($item, $cookie);
    $cookies = array_merge($cookies, $cookie);
}
var_dump($cookies);
TML
  • 12,813
  • 3
  • 38
  • 45
  • 34
    Unfortunately I have a feeling that this is the right answer. I think its ridiculous that curl can't just hand me a mapped array though. – thirsty93 May 22 '09 at 15:26
  • 3
    I'll give it to you but the preg_match was wrong. I didn't just want the session, tho I understand why you would think that. But the genius who made their system is loading the cookie with an entire response map like with a get or post. Shit like this: Set-Cookie: price=1 Set-Cookie: status=accept I needed a preg_match_all with '/^Set-Cookie: (.*?)=(.*?)$/sm' – thirsty93 May 22 '09 at 23:08
  • No it is, I just haven't been on a computer in a bit :) – thirsty93 May 26 '09 at 17:40
  • 8
    @thirsty93 Curl cant hand you a mapped array. But shows you a way to save it `curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'callback_SaveHeaders');` – Shiplu Mokaddim Jan 14 '12 at 07:02
  • 3
    Depending on the cookie structure returned, the last line might need to be modified to something like `parse_str($m[1], $cookies)`, which will stuff the cookies into an associative array in the `$cookies` variable.... – random_user_name Mar 09 '13 at 00:00
  • 1
    This answer was close to what I needed but I ended up using: `preg_match_all('/(?<=Set-Cookie: ).[a-zA-Z0-9_=]+/', $curl_header, $matches)` – ryanpf Feb 05 '14 at 17:09
  • 8
    For the combined fixes that grabs more than one cookie: `preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $result, $matches); $cookies = array(); foreach($matches[1] as $item) { parse_str($item, $cookie); $cookies = array_merge($cookies, $cookie); }` – Combuster May 07 '15 at 10:02
  • 1
    cookie values are urlencoded. this code does not urldecode() the values. sure it works for a-zA-Z0-9, but you'll get the url encoded values of stuff like @ as "%40", space as "%20", etc :p – hanshenrik Sep 05 '15 at 17:36
  • If you do not have a semicolon at the end of your cookie, the variable will continue to the end of the headers. You can prevent that by using this: `preg_match_all('/^Set-Cookie:\s*([^;\r\n]*)/mi', $header, $matches);` Thanks to http://stackoverflow.com/a/35438727/594602 – Erik Allen Feb 16 '16 at 17:30
  • As I've seen this only gets values for Set-Cookies. What happens to cookies that are already sent in the previous request? – maddo7 Feb 04 '17 at 00:46
  • @TML: I am unable to get all the cookies using this script/code. I used 'http://www.medium.com' to get the details of the cookies, but there are only 2 records, however, if I check in the console, there all so many other cookies set by this website. How to get all the cookies? No luck so far. – Yogesh Chauhan Feb 08 '20 at 17:04
  • WARNINGs: This won't work if `CURLOPT_HEADER` is set to `0`. Regex reads html content too. – Gray Programmerz Oct 06 '21 at 11:04
45

Although this question is quite old, and the accepted response is valid, I find it a bit unconfortable because the content of the HTTP response (HTML, XML, JSON, binary or whatever) becomes mixed with the headers.

I've found a different alternative. CURL provides an option (CURLOPT_HEADERFUNCTION) to set a callback that will be called for each response header line. The function will receive the curl object and a string with the header line.

You can use a code like this (adapted from TML response):

$cookies = Array();
$ch = curl_init('http://www.google.com/');
// Ask for the callback.
curl_setopt($ch, CURLOPT_HEADERFUNCTION, "curlResponseHeaderCallback");
$result = curl_exec($ch);
var_dump($cookies);

function curlResponseHeaderCallback($ch, $headerLine) {
    global $cookies;
    if (preg_match('/^Set-Cookie:\s*([^;]*)/mi', $headerLine, $cookie) == 1)
        $cookies[] = $cookie;
    return strlen($headerLine); // Needed by curl
}

This solution has the drawback of using a global variable, but I guess this is not an issue for short scripts. And you can always use static methods and attributes if curl is being wrapped into a class.

Googol
  • 2,572
  • 20
  • 9
  • 10
    Instead of a global, you can use a closure holding a reference to `$cookies`. `$curlResponseHeaderCallback = function ($ch, $headerLine) use (&$cookies) {` then `curl_setopt($ch, CURLOPT_HEADERFUNCTION, $curlResponseHeaderCallback);`. – Seph Apr 26 '16 at 06:48
  • 1
    What happens if you have this all in a class? How do you reference the class function `$class->curlResponseHeaderCallback()`? Or do you just have `curlResponseHeaderCallback` outside of the class? – Robert Johnstone May 15 '20 at 10:28
13

This does it without regexps, but requires the PECL HTTP extension.

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
curl_close($ch);

$headers = http_parse_headers($result);
$cookobjs = Array();
foreach($headers AS $k => $v){
    if (strtolower($k)=="set-cookie"){
        foreach($v AS $k2 => $v2){
            $cookobjs[] = http_parse_cookie($v2);
        }
    }
}

$cookies = Array();
foreach($cookobjs AS $row){
    $cookies[] = $row->cookies;
}

$tmp = Array();
// sort k=>v format
foreach($cookies AS $v){
    foreach ($v  AS $k1 => $v1){
        $tmp[$k1]=$v1;
    }
}

$cookies = $tmp;
print_r($cookies);
David Ferenczy Rogožan
  • 23,966
  • 9
  • 79
  • 68
Alex P
  • 161
  • 1
  • 5
11

If you use CURLOPT_COOKIE_FILE and CURLOPT_COOKIE_JAR curl will read/write the cookies from/to a file. You can, after curl is done with it, read and/or modify it however you want.

SoapBox
  • 20,457
  • 3
  • 51
  • 87
4

libcurl also provides CURLOPT_COOKIELIST which extracts all known cookies. All you need is to make sure the PHP/CURL binding can use it.

Daniel Stenberg
  • 54,736
  • 17
  • 146
  • 222
2

The accepted answer seems like it will search through the entire response message. This could give you false matches for cookie headers if the word "Set-Cookie" is at the beginning of a line. While it should be fine in most cases. The safer way might be to read through the message from the beginning until the first empty line which indicates the end of the message headers. This is just an alternate solution that should look for the first blank line and then use preg_grep on those lines only to find "Set-Cookie".

    curl_setopt($ch, CURLOPT_HEADER, 1);
    //Return everything
    $res = curl_exec($ch);
    //Split into lines
    $lines = explode("\n", $res);
    $headers = array();
    $body = "";
    foreach($lines as $num => $line){
        $l = str_replace("\r", "", $line);
        //Empty line indicates the start of the message body and end of headers
        if(trim($l) == ""){
            $headers = array_slice($lines, 0, $num);
            $body = $lines[$num + 1];
            //Pull only cookies out of the headers
            $cookies = preg_grep('/^Set-Cookie:/', $headers);
            break;
        }
    }
Nikolay Mihaylov
  • 3,868
  • 8
  • 27
  • 32
Rich Wandell
  • 123
  • 1
  • 6
  • 1
    The accepted answer seems like it will search through the entire response message. This could give you false matches for cookie headers if the word "Set-Cookie" is at the beginning of a line. While it should be fine in most cases. The safer way might be to read through the message from the beginning until the first empty line which indicates the end of the message headers. This is just an alternate solution that should look for the first blank line and then use preg_grep on those lines only to find "Set-Cookie". – Rich Wandell Apr 28 '16 at 18:31
1

someone here may find it useful. hhb_curl_exec2 works pretty much like curl_exec, but arg3 is an array which will be populated with the returned http headers (numeric index), and arg4 is an array which will be populated with the returned cookies ($cookies["expires"]=>"Fri, 06-May-2016 05:58:51 GMT"), and arg5 will be populated with... info about the raw request made by curl.

the downside is that it requires CURLOPT_RETURNTRANSFER to be on, else it error out, and that it will overwrite CURLOPT_STDERR and CURLOPT_VERBOSE, if you were already using them for something else.. (i might fix this later)

example of how to use it:

<?php
header("content-type: text/plain;charset=utf8");
$ch=curl_init();
$headers=array();
$cookies=array();
$debuginfo="";
$body="";
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
$body=hhb_curl_exec2($ch,'https://www.youtube.com/',$headers,$cookies,$debuginfo);
var_dump('$cookies:',$cookies,'$headers:',$headers,'$debuginfo:',$debuginfo,'$body:',$body);

and the function itself..

function hhb_curl_exec2($ch, $url, &$returnHeaders = array(), &$returnCookies = array(), &$verboseDebugInfo = "")
{
    $returnHeaders    = array();
    $returnCookies    = array();
    $verboseDebugInfo = "";
    if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
        throw new InvalidArgumentException('$ch must be a curl handle!');
    }
    if (!is_string($url)) {
        throw new InvalidArgumentException('$url must be a string!');
    }
    $verbosefileh = tmpfile();
    $verbosefile  = stream_get_meta_data($verbosefileh);
    $verbosefile  = $verbosefile['uri'];
    curl_setopt($ch, CURLOPT_VERBOSE, 1);
    curl_setopt($ch, CURLOPT_STDERR, $verbosefileh);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    $html             = hhb_curl_exec($ch, $url);
    $verboseDebugInfo = file_get_contents($verbosefile);
    curl_setopt($ch, CURLOPT_STDERR, NULL);
    fclose($verbosefileh);
    unset($verbosefile, $verbosefileh);
    $headers       = array();
    $crlf          = "\x0d\x0a";
    $thepos        = strpos($html, $crlf . $crlf, 0);
    $headersString = substr($html, 0, $thepos);
    $headerArr     = explode($crlf, $headersString);
    $returnHeaders = $headerArr;
    unset($headersString, $headerArr);
    $htmlBody = substr($html, $thepos + 4); //should work on utf8/ascii headers... utf32? not so sure..
    unset($html);
    //I REALLY HOPE THERE EXIST A BETTER WAY TO GET COOKIES.. good grief this looks ugly..
    //at least it's tested and seems to work perfectly...
    $grabCookieName = function($str)
    {
        $ret = "";
        $i   = 0;
        for ($i = 0; $i < strlen($str); ++$i) {
            if ($str[$i] === ' ') {
                continue;
            }
            if ($str[$i] === '=') {
                break;
            }
            $ret .= $str[$i];
        }
        return urldecode($ret);
    };
    foreach ($returnHeaders as $header) {
        //Set-Cookie: crlfcoookielol=crlf+is%0D%0A+and+newline+is+%0D%0A+and+semicolon+is%3B+and+not+sure+what+else
        /*Set-Cookie:ci_spill=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22305d3d67b8016ca9661c3b032d4319df%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%2285.164.158.128%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A109%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F43.0.2357.132+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1436874639%3B%7Dcab1dd09f4eca466660e8a767856d013; expires=Tue, 14-Jul-2015 13:50:39 GMT; path=/
        Set-Cookie: sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT;
        //Cookie names cannot contain any of the following '=,; \t\r\n\013\014'
        //
        */
        if (stripos($header, "Set-Cookie:") !== 0) {
            continue;
            /**/
        }
        $header = trim(substr($header, strlen("Set-Cookie:")));
        while (strlen($header) > 0) {
            $cookiename                 = $grabCookieName($header);
            $returnCookies[$cookiename] = '';
            $header                     = substr($header, strlen($cookiename) + 1); //also remove the = 
            if (strlen($header) < 1) {
                break;
            }
            ;
            $thepos = strpos($header, ';');
            if ($thepos === false) { //last cookie in this Set-Cookie.
                $returnCookies[$cookiename] = urldecode($header);
                break;
            }
            $returnCookies[$cookiename] = urldecode(substr($header, 0, $thepos));
            $header                     = trim(substr($header, $thepos + 1)); //also remove the ;
        }
    }
    unset($header, $cookiename, $thepos);
    return $htmlBody;
}

function hhb_curl_exec($ch, $url)
{
    static $hhb_curl_domainCache = "";
    //$hhb_curl_domainCache=&$this->hhb_curl_domainCache;
    //$ch=&$this->curlh;
    if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
        throw new InvalidArgumentException('$ch must be a curl handle!');
    }
    if (!is_string($url)) {
        throw new InvalidArgumentException('$url must be a string!');
    }

    $tmpvar = "";
    if (parse_url($url, PHP_URL_HOST) === null) {
        if (substr($url, 0, 1) !== '/') {
            $url = $hhb_curl_domainCache . '/' . $url;
        } else {
            $url = $hhb_curl_domainCache . $url;
        }
    }
    ;

    curl_setopt($ch, CURLOPT_URL, $url);
    $html = curl_exec($ch);
    if (curl_errno($ch)) {
        throw new Exception('Curl error (curl_errno=' . curl_errno($ch) . ') on url ' . var_export($url, true) . ': ' . curl_error($ch));
        // echo 'Curl error: ' . curl_error($ch);
    }
    if ($html === '' && 203 != ($tmpvar = curl_getinfo($ch, CURLINFO_HTTP_CODE)) /*203 is "success, but no output"..*/ ) {
        throw new Exception('Curl returned nothing for ' . var_export($url, true) . ' but HTTP_RESPONSE_CODE was ' . var_export($tmpvar, true));
    }
    ;
    //remember that curl (usually) auto-follows the "Location: " http redirects..
    $hhb_curl_domainCache = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL), PHP_URL_HOST);
    return $html;
}
hanshenrik
  • 19,904
  • 4
  • 43
  • 89
-1

My understanding is that cookies from curl must be written out to a file (curl -c cookie_file). If you're running curl through PHP's exec or system functions (or anything in that family), you should be able to save the cookies to a file, then open the file and read them in.

kyle
  • 1,460
  • 2
  • 15
  • 21