4

I'm trying to encode a → (Right arrow, → or unicode 2192 hex) into an email subject line.

When I use php's mb_encode_mimeheader() I get a different value to when I do the same thing with Thunderbird or Gmail. But when the php-generated email arrives, the character is not properly displayed. Also, PHP's mb_decode_mimeheader() works on the output from PHP, but not to decode content from the other email sources.

By way of a hex dump, I've worked out that a UTF-8 representation of the arrow is

<?php
$rarr = "\xe2\x86\x92";

mb_encode_mimeheader($rarr, 'UTF-8'); //     =?UTF-8?B?w6LChsKS?=
// whereas Tbird and Gmail produce:          =?UTF-8?B?4oaS?=
// and more manually:
'=?UTF-8?B?' . base64_encode($rarr).'?='; // =?UTF-8?B?4oaS?=

PHP's encoding comes out in Thunderbird and Gmail as: â

I am completely confused by PHP's behaviour as it does not appear to be producing standard results.

How can I get PHP to encode a UTF-8 email header value so that it will be properly decoded by mail clients?

artfulrobot
  • 20,637
  • 11
  • 55
  • 81

1 Answers1

8

Seems there is a bug that ignores the second parameter, I get the correct result when I add internal encoding:

<?php
$rarr = "\xe2\x86\x92";
mb_internal_encoding( "UTF-8");
echo mb_encode_mimeheader($rarr, 'UTF-8'); //=?UTF-8?B?4oaS?=

But

<?php
$rarr = "\xe2\x86\x92";

mb_encode_mimeheader($rarr, 'UTF-8'); //=?UTF-8?B?w6LChsKS?=

Just setting internal encoding is enough:

<?php
$rarr = "\xe2\x86\x92";
mb_internal_encoding( "UTF-8");
echo mb_encode_mimeheader($rarr); //=?UTF-8?B?4oaS?=
Esailija
  • 138,174
  • 23
  • 272
  • 326
  • Thank you! I just got this myself. Seems backwards to have to set the internal encoding for the whole subsystem just for one string. I've resorted to storing the original value, changing it to the required value, then restoring the original value again to be safe! – artfulrobot Nov 16 '12 at 12:02
  • @artfulrobot true but why would you ever operate on any other internal encoding other than utf-8 :P – Esailija Nov 16 '12 at 12:03
  • Well `mb_encode_mimeheader()` is useful for splitting long headers over multiple lines and adding indentation. So it's useful for plain ASCII headers too. Good point though, I should probably update my default to UTF-8 as it's stuck on Latin 1 – artfulrobot Nov 16 '12 at 12:03
  • @artfulrobot I mean the default internal encoding is probably something useless like `ISO-8859-1` (Which cannot even encode `€`), if you want ASCII, UTF-8 is perfectly compatible with ASCII. Any ASCII encoded string is UTF-8 encoded string as well. – Esailija Nov 16 '12 at 12:04
  • True. I'm in UK, and there's often problems with the currency symbol £ which is in ISO-8859-1 and (obviously) different in UTF-8. Might need to run these strings too. – artfulrobot Nov 16 '12 at 12:06