2

I thought I had a perfect scheme, using base64 encoded data for cookies in visitor pages, to identify the visitor. (Actually the cookies represent an RC4 encoded, re-processed with base64 to make a "cookie safe" result. Since there are no characters output by base 64 that are illegal for cookies in any browser, I was confident this would not pose a problem. I further hoped to check the cookie from a PHP script via the $_COOKIE array. All seemed to be going well until a particular cookie value ended up being base64 encoded as...

9xu3EhM5+6duW4feCL4aHuxOceo=

There was definitely no problem writing or reading this cookie value to my browser. If I create it using javascript and then examine it using the browser's privacy options, it is NOT corrupt. If I read the cookie via javascript and display it in an alert() or the console, it is also NOT corrupt. But upon "reading that" cookie from the PHP $_COOKIE array, what I got back was...

9xu3EhM5 6duW4feCL4aHuxOceo=

This is PHP 5.6 if it matters. Why is the "+" symbol missing? And sadly, the problem is not confined to the $_COOKIE array! Even writing a simple PHP program to respond back with what I send it (via a GET request), I still see the "+" sign missing in the response.

If this is a problem that is related to character encoding, I can't see how. Even if I just plug my PHP script URL into the browser's address bar, where no active page has set any character encoding, the "+" sign is lost en rout to the script. And I've also verified that a simple script to do nothing but respond with the hard coded "non corrupt" string works fine.

So clearly the problem is confined to the passage of data FROM the browser TO the PHP. And even if I could come up with some crazy scheme to compensate for strings passed manually (like via a POST request), I don't see any way to control what the PHP script sees when data is pulled from the $_COOKIE array.

What can I do? I really have been counting on the script being able to do this seemingly simple task.

---EDIT---------------

Though I've found others complaining about this mysterious "+" character going missing since posting, I've seen no simple solution, and decided to implement my own. Since I've been doing all my base64 (encode and decode) from within my PHP scripts anyway, and since my code is the only place where these strings must be created, stored, and recovered, I've decided to run all base64 encoded strings through this routine (below) before using it to store a cookie. Likewise, I'll pass each cookie obtained (for example, via the $_COOKIE array) through it prior to base-64 decoding it.

// from browser to PHP. substitute troublesome chars with 
// other cookie safe chars, or vis-versa.  

function fix64($inp) {
    $out =$inp;
    for($i = 0; $i < strlen($inp); $i++) {
        $c = $inp[$i];
        switch ($c) {
          case '+':  $c = '*'; break;   // definitly won't transfer!
          case '*':  $c = '+'; break;

          case '=':  $c = ':'; break; // = symbol seems like a bad idea
          case ':':  $c = '='; break;

          case '/':  $c = '_'; break; // no good for dir name!!!
          case '_':  $c=  '/'; break;

            default: continue;
            }
        $out[$i] = $c;
        }
    return $out;
    }

I'm simply substituting "+" (and I decided "=" as well) with other "cookie safe" characters, before returning the encoded value to the page, for use as a cookie.

EDIT----- I added and altered the above a little to also remove/replace the "/" character, which is not a problem with the $_COOKIE array, but it is a troublesome character if, for example, you wanted to write a file or create a directory with the same name as the cookie.

Note that the length of the string being processed doesn't change. When the same (or another page on the site) runs my PHP script again, and I recover the cookie, I can then pass it back through the same fix64() call I created, knowing that from there I can decode it like normal base64.

I did not answer my own question, as I was hoping there would be some simple "official" PHP setting I could invoke that would change this behavior, and am still hopeful such a thing exists. But for my case, and for now, this is a reasonable approach, which can easily be reversed if I need to someday.

Randy
  • 301
  • 2
  • 11
  • https://www.w3schools.com/tags/ref_urlencode.asp – Iłya Bursov Jul 02 '19 at 23:19
  • @IłyaBursov I don't understand. The cookie is written to the browser. The PHP is not reading the cookie via a URL (that was just a test). I want to read the cookie using the $_COOKIE array in the PHP script, without having to "pass" it, in a GET, POST, or URL. – Randy Jul 02 '19 at 23:52
  • https://stackoverflow.com/questions/49205195/should-cookie-values-be-url-encoded – Iłya Bursov Jul 03 '19 at 00:06
  • @IłyaBursov - note how in the "accepted' answer, it is mentioned that base64 is good for storing arbitrary values in cookies. But as i commented there, the inclusion of "+" (and "=" for that matter" in a so-called safe character encoding makes it unsafe for cookies. At least if you expect to read and decode those cookies in PHP. – Randy Jul 03 '19 at 05:50
  • read one answer down `PHP: URL encode` – Iłya Bursov Jul 03 '19 at 06:46
  • 1
    Then link to the answer, not the question only. – AmigoJack Jul 03 '19 at 07:05
  • OK, appreciated. But I''m still seeing no stipulation about the specific issue I cited. The '+' character is part of the output from base64, and if a string containing that char is encoded into a cookie, browsers will accept it, and faithfully reproduce it. But if that cookie is transmitted to PHP, it will be replaced by a space. , – Randy Jul 03 '19 at 13:51
  • "This is PHP 5.6 if it matters" — **Danger** that version of PHP has been [**end of life**](https://www.php.net/eol.php) for half a year. Upgrade to a supported version of PHP. – Quentin Jul 03 '19 at 14:16
  • @Quentin - yes, I've been meaning to. The reason i didn't was because when my (lousy) hosting company came along one day and upgraded everyone to 7.x, my logs started to fill with some of the weirdest errors and warnings I've ever seen. (can't recall them now, but they were database errors, and I was using no database resources at all at the time.) Funny my scripts all continued to work, but who wants logs full of junk? And since "tech support" had no useful info, I went back (if only to have clean logs). I planned on switching hosting services soon anyway, so I figure I'll wait. – Randy Jul 03 '19 at 19:57

1 Answers1

3

setcookie() exists since PHP/4 and produces URL-encoded values:

setcookie('a', '9xu3EhM5+6duW4feCL4aHuxOceo=');
Set-Cookie: a=9xu3EhM5%2B6duW4feCL4aHuxOceo%3D

Accordingly, $_COOKIE URL-decodes the values:

Cookie: a=9xu3EhM5%2B6duW4feCL4aHuxOceo%3D
array(1) {
  ["a"]=>
  string(28) "9xu3EhM5+6duW4feCL4aHuxOceo="
}

Since PHP/5 there's also setrawcookie() with the only purpose of not URL-encoding values:

setrawcookie('b', '9xu3EhM5+6duW4feCL4aHuxOceo=');
Set-Cookie: b=9xu3EhM5+6duW4feCL4aHuxOceo=

But $_COOKIE still assumes URL-encoded input and stuff breaks (+ is the obsolete encoding for U-0020 'SPACE', aka good old whitespace):

Cookie: b=9xu3EhM5+6duW4feCL4aHuxOceo=
array(1) {
  ["b"]=>
  string(28) "9xu3EhM5 6duW4feCL4aHuxOceo="
}

Interestingly, I couldn't find a counterpart for setrawcookie(). That leaves you in the situation of having to write your own parser :-! $_SERVER['HTTP_COOKIE'] contains the raw value of the HTTP header, which is a semicolon-separated list, e.g.:

a=9xu3EhM5%2B6duW4feCL4aHuxOceo%3D; b=9xu3EhM5+6duW4feCL4aHuxOceo=

For instance, the Slim microframework has a Cookies::parseHeader() method to do exactly that (not sure why, since they urldecode() everything anyway):

public static function parseHeader($header)
{
    if (is_array($header) === true) {
        $header = isset($header[0]) ? $header[0] : '';
    }
    if (is_string($header) === false) {
        throw new InvalidArgumentException('Cannot parse Cookie data. Header value must be a string.');
    }
    $header = rtrim($header, "\r\n");
    $pieces = preg_split('@[;]\s*@', $header);
    $cookies = [];
    foreach ($pieces as $cookie) {
        $cookie = explode('=', $cookie, 2);
        if (count($cookie) === 2) {
            $key = urldecode($cookie[0]);
            $value = urldecode($cookie[1]);
            if (!isset($cookies[$key])) {
                $cookies[$key] = $value;
            }
        }
    }
    return $cookies;
}

I guess you can use this code and skip the decoding part.

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
  • Thanks. At least now I understand WHY this is happening. And not just with the $_COOKIE array! But $_COOKIE is very convenient for me , as I'm dealing with non PHP parsed HTMLpages where setting the cookie directly from the called script is not really an option, I guess I'll stick with my "fix64()" add on. Its a quick fix. To be 'punny' about it, this is a case of a %2B or NOT %2b question now, This time I think I'll just stick with my fix64() add on. For the 1 or 2 chars, it seems to make sense. I'd hoped adding "mb_internal_encoding("UTF-8");" would help, but no such luck there either. – Randy Jul 03 '19 at 21:11