PHP and character encoding problem with Â character

Question

I'm having a problem where PHP (5.2) cannot find the character 'Â' in a string, though it is clearly there.

I realize the underlying problem has to do with character encoding, but unfortunately I have no control over the source content. I receive it as UTF-8, with those characters already in the string.

I would simply like to remove it from the string. strpos(), str_replace(), preg_replace(), trim(), etc. Cannot correctly identify it.

My string is this:

"Â  Â  Â  A lot of couples throughout the World "

If I do this:

$string = str_replace('Â','',$string);

I get this:

"Â� Â� Â� A lot of couples throughout the World"

I even tried utf8_encode() and utf8_decode() before the str_replace, with no luck.

What's the solution? I've been throwing everything I can find at it...

For Â£ : $input = str_replace("£", "£", $input); – atwellpub Dec 15 '10 at 05:23 — atwellpub, Dec 15 '10 at 05:23

score 4 · Answer 1 · answered Aug 27 '10 at 19:23

4

$string = str_replace('Â','',$string);

How is this 'Â' encoded? If your script file is saved as iso-8859-1 the string 'Â' is encoded as the one byte sequence xC2 while the (/one) utf-8 representation is xC3 x82. php's str_replace() works on the byte level, i.e. it only "knows" single-byte characters.

see http://docs.php.net/intro.mbstring

answered Aug 27 '10 at 19:23

VolkerK

95,432
20
163
226

+1, you can therefore write the replace as: `str_replace(chr(195) . chr(130), '', $string)`... (where `195` and `130` are `xC3` and `x82` converted from Hex to decimal, respectively)... Or, since PHP supports hex numbers: `str_replace(chr(0xC3), chr(0x82), '', $string)`... – ircmaxell Aug 27 '10 at 19:39
I also found that mb_ereg_replace() didn't seem to work properly; Isn't this its purpose? Your information is extremely useful and I'll be sure to read the documentation you linked. Thanks! – Travis Aug 27 '10 at 20:10
@Travis: The parameters you pass to the mbstring functions have to be encoded properly as well. If you have a string literal in your script (like 'Â') then the encoding depends on how you've saved the script file. – VolkerK Aug 27 '10 at 23:37

score 3 · Accepted Answer · answered Aug 27 '10 at 19:15

3

I use this:

function replaceSpecial($str){
$chunked = str_split($str,1);
$str = ""; 
foreach($chunked as $chunk){
    $num = ord($chunk);
    // Remove non-ascii & non html characters
    if ($num >= 32 && $num <= 123){
            $str.=$chunk;
    }
}   
return $str;
}

answered Aug 27 '10 at 19:15

KeatsKelleher

10,015
4
45
52

You can expand this to allow all ascii characters by changing 32 to 0 and 123 to 255. – KeatsKelleher Aug 27 '10 at 19:16
This will remove MANY more characters than just accents. – shamittomar Aug 27 '10 at 19:17
5

First off, the only ASCII overlap is between 0 and 127. If you allow character 128 or higher, you'll break the encoding (this is due to the multi-byte nature of UTF-8). However, this is a quite dirty method of doing that. What I would do if I was you, is simply use the [`iconv`](http://us3.php.net/manual/en/book.iconv.php) function if you need to convert to ASCII... `$str = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string)`, especially since it'll transliterate characters for you... – ircmaxell Aug 27 '10 at 19:49
Ahh.. I think I understand the solution, but I'm still not clear why PHP doesn't recognize the characters? I think I'll use something like this, but only strip a few specific chars. Thanks! – Travis Aug 27 '10 at 20:14

score 1 · Answer 3 · edited May 23 '17 at 12:30

1

From the PHP Manual Comment Page:

http://www.php.net/manual/en/function.preg-replace.php#96847

And from StackOverflow:

Remove accents without using iconv

edited May 23 '17 at 12:30

Community

1
1

answered Aug 27 '10 at 19:14

shamittomar

46,210
12
74
78

PHP and character encoding problem with Â character

3 Answers3

Linked