0

I have strings in different languages, such as Korean, encoded in UTF-8 (the strings).

I am trying to convert them into something like &#number; or ä etc. whenever possible, but using this in PHP:

   $str = htmlspecialchars($str, ENT_QUOTES);

doesn't do that.

What is the right way to do that?

kloop
  • 4,537
  • 13
  • 42
  • 66
  • 1
    htmlspecialchars is not designed to convert unicode characters to entities. IT only translates the following: &' (ampersand) becomes '&' '"' (double quote) becomes '"' when ENT_NOQUOTES is not set. "'" (single quote) becomes ''' (or ') only when ENT_QUOTES is set. '<' (less than) becomes '<' '>' (greater than) becomes '>' – Lauri Orgla May 13 '16 at 15:49
  • You must make sure all your php files are encoded in UTF-8 and all pages you send contain a html header that indicates UTF-8 – Martin Verjans May 13 '16 at 15:51
  • @Superpeanut it is actually not a display problem, I do need to convert it to such encoding for other reasons. The PHP pages are utf8 encoded, but I interface with other things I don't have control over. – kloop May 13 '16 at 15:52
  • @Lauri thanks. is there a way to convert then all UTF8 characters into their corresponding & code? – kloop May 13 '16 at 15:53
  • try checking [this question](http://stackoverflow.com/questions/1365583/how-to-get-the-character-from-unicode-code-point-in-php), it's the opposite of what you're asking but searching on these functions might help – Martin Verjans May 13 '16 at 15:57
  • @kloop What about mb_convert_encoding? You could convert them with that i believe. – Lauri Orgla May 13 '16 at 15:58

1 Answers1

0

You can use mb_convert_encoding like this:

mb_convert_encoding($str, "HTML-ENTITIES", "UTF-8");

As found here: Converting Korean characters into entities

Here's the man page for reference: http://php.net/manual/en/function.mb-convert-encoding.php

Community
  • 1
  • 1
Michael
  • 2,016
  • 5
  • 35
  • 51