1

I have a PHP script that read emails/usenet messages, I found a case where I have a text that's a mix of arabic & latin words, ie.

PHP and ARABIC_WORD

ie.

PHP and الساعة

The problem is, the text is encoded, ie.

Some Text =?utf-8?b?RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=?=

My question is How can I decode this ?utf-8?... when it's mixed with latin text?

I'm using PHP 5.4.15

TheDude
  • 3,045
  • 4
  • 46
  • 95

2 Answers2

3

What you've got is the MIME Encoded-Word syntax used in email messages for non US-ASCII encoded texts:

The form is: "=?charset?encoding?encoded text?=".

  • charset may be any character set registered with IANA. Typically it would be the same charset as the message body.
  • encoding can be either "Q" denoting Q-encoding that is similar to the quoted-printable encoding, or "B" denoting base64 encoding.
  • encoded text is the Q-encoded or base64-encoded text. -An encoded-word may not be more than 75 characters long, including charset, encoding, encoded text, and delimiters. If it is desirable to encode more text than will fit in an encoded-word of 75 characters, multiple encoded-words (separated by CRLFSP) may be used.

So this little excerpt from wikipedia also contains how you can decode the string. Sure you're not the first one who needs to do this, therefore libraries exist. See as well:

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
  • Thanks you very much hatre! Very well explained and the [second link/answer](http://stackoverflow.com/a/13694392/1165880) fix this for me, thanks again! – TheDude Jun 27 '13 at 14:08
0

it seems to be encoded text: try with php function base64_decode.

$my_string = 'test string';
$res = base64_encode($my_string);
echo $res; //dGVzdCBzdHJpbmc=
echo base64_decode($res); // test string

in fact, decoding your string:

base64_decode("RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=")

return something like this:

EPrints and العربية
girardengo
  • 726
  • 12
  • 16
  • Sorry for not being more explicit, I indeed call base64_decode() and it indeed does give me `EPrints and العربية`, but it really should give `EPrints and العربية`, so clearly there's something else missing – TheDude Jun 21 '13 at 17:27
  • maybe this is a problem with page charset: try add in your page: `header('Content-type: text/html; charset=utf-8');` and after decode string `echo base64_decode("RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=");` – girardengo Jun 21 '13 at 17:57
  • Thank you, but as I said, since the text contains mixed data, I can't use `base64_decode("RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=")`, but rather `base64_decode("Some Text =?utf-8?b?RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=?=")`, which causes(?) the conversion to fail – TheDude Jun 22 '13 at 08:03