1

I have a strange problem...

I have the following string:

$sString = "This is my encoded string é à";

First, I remove html entities:

$sString = html_entity_decode($sString, ENT_COMPAT, 'UTF-8');

What I want is to split this string properly to show each char in a different column of the same table's line.

Well, logically, I used:

$aString = str_split($sString) // Fill an array with each char

It doesn't work. It show in box the char as I didn't used html_entity_decode...

So, I decided to try the following:

   for($i = 0; $i < 16; $i++) {
     echo "<td>";
     echo $sLine1[$i];
     echo "</td>";
   }

It works BUT special chars as showed as a ? in a black box (encoding problem).

Where it's really strange, it's that when I don't put it in <td> elements, it shows well and there's no encoding problems !

My HTML page contains the charset to UTF-8 and is correctly formated (with doctype, html, body, etc...)

I have to admit that at this point, I've no idea from where this problem comes...

UPDATE

I just realized that when I show char by char outside the <td>, it doesn't work either. The encoded char needs to be by pair to work ! It's a problem for me because the string comes from a database, and special chars won't always be at the same place !

Exemple:

This will show the encoding problem char:

$sString = "Paëlla";
echo $sString[3];

But in this way, it will show the ë:

$sString = "Paëlla";
echo $sString[3];
echo $sString[4]; 
GRosay
  • 444
  • 10
  • 26

2 Answers2

3

str_split split the string on bytes. But in UTF-8, characters like é and à are encoded on a sequence of 2 bytes. You need to use mbstring to be UTF-8 aware.

mb_internal_encoding('UTF-8');

function mb_str_split($string, $length = 1) {
    $ret = array();
    $l = mb_strlen($string);

    for ($i = 0; $i < $l; $i += $length) {
        $ret[] = mb_substr($string, $i, $length);
    }

    return $ret;
}

Same if you apply [offset] to a string: you get a byte, not a character if the charset of the string may encode a character on more than a byte. In this case, use mb_substr.

mb_internal_encoding('UTF-8');

echo mb_substr("Paëlla", 2, 1);
julp
  • 3,860
  • 1
  • 22
  • 21
0

Some adding to dinesh123 answer:

  • Try to trim html strip_tags before you get a string ($sString)
  • Check a file encoding
  • Try to set header("Content-Type:text/html; charset=UTF-8") in start of file