1

I found a simple function to remove some undesired characters from a string.

function strClean($input){

$input = strtolower($input);
$b = array("á","é","í","ó","ú", "ñ", " "); //etc...
$c = array("a","e","i","o","u","n", "-"); //etc...

$input = str_replace($b, $c, $input);

return $input;
}

When I use it on accents or other characters, like this word 'á é ñ í' it prints out those question marks or weird characters, like: output http://img217.imageshack.us/img217/6794/59472278.jpg

Note: I'm using strclean.php (which contains this function) and index.php, both in UTF-8. index.php looks as follows:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title></title>
</head>
<body>
    <?php
    include('strclean.php');

    echo 'óóóáà';
    echo strClean('óóóáà');


    ?>
</body>
</html>

What am I doing wrong?

tchrist
  • 78,834
  • 30
  • 123
  • 180
Ignacio
  • 7,947
  • 15
  • 63
  • 74
  • Your example works with me, so you must have some encoding problems somewhere... – Glavić Mar 03 '09 at 14:55
  • Ok, thanks. At least I know I'm doing things right. However, it doesn't alleviate my headache :S I've been having encoding issues for a while now. Before it was in phpmyadmin. – Ignacio Mar 03 '09 at 14:57
  • please stop butchering our languages. the proper replacement of ä in german is ae, not a. read up on transliteration! –  Mar 03 '09 at 15:03
  • Are you using Firebug? Can you see that the expected encoding is being declared in the HTTP headers? – Trevor Bramble Mar 03 '09 at 15:05
  • Write "" in first line, before any output. – Glavić Mar 03 '09 at 15:26
  • Hop, I'm not butchering anything, I'm from South America, so I speak spanish and have some "weird" characters too. I'm just generating a URL. Anyways, the purpose of my code is of no concern to you. – Ignacio Mar 04 '09 at 12:56

6 Answers6

5

Use

iconv('UTF-8', 'ASCII//TRANSLIT', $input);
vartec
  • 131,205
  • 36
  • 218
  • 244
  • That's the good way to do it. Many (spoken) languages use various accents, and also multiple accents (like in ê + ` = ề). This won't work with a replacement table, if it's not exhaustive. – Yvan Sep 28 '11 at 10:04
4

You may want to try iconv.

David Segonds
  • 83,345
  • 10
  • 45
  • 66
3

Does a replacement happen at all, i.e. do you get the same weird characters when you print $input beforehand? If so, the character sets of your PHP source code file and the input do not match and you might need to use iconv() on the input before replacing.

edit: I took both of your files, uploaded them to my webserver and printing and cleaning works fine (see http://www.tag-am-meer.com/test1/). This is on PHP 4.4.9 and Firefox 3.0.6. More potential problems that come to my mind:

  • Does it work for you on Firefox? I remember vaguely that IE6 (and probably later versions as well) expect the charset in the HTML head section to be written in lowercase ("utf-8")
  • Does your editor include byte order marks (BOM) in the code files? Mine does not, maybe PHP chokes on those.
  • Can you look at the HTTP headers to see if there's something unusual going on, like a bad MIME type? The Tamper Data add-on for Firefox can help with this.
Simon
  • 12,018
  • 4
  • 34
  • 39
  • Yes, blank spaces get replaced, as well as other characters which I haven't included, such as '.' All my files are in utf-8, and if I print ááàà I see it correctly, that's why I think this is weird... – Ignacio Mar 03 '09 at 14:52
2

I have tested your code, and error is in strtolower function...

Replace it with mb_strtolower, like bellow

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title></title>
</head>
<body>

<?php
    function strClean($input) {
        $input = mb_strtolower($input, 'UTF-8');
        $b = array("á","é","í","ó","ú", "n", " ");
        $c = array("a","e","i","o","u","n", "-");
        return str_replace($b, $c, $input);
    }

    $string = 'á é í ó ú n abcdef ghij';
    echo $string ."<br />". strClean($string);
?>

</body>
</html>
Glavić
  • 42,781
  • 13
  • 77
  • 107
0

I found myself with this trouble before, and I tried to follow the leads of this post and others I found on the way and there was no simple solution, cause you have to know the charset that your system uses (in my case ISO-8859-1) and this is what I did:

    function quit_accenture($str){
      $pattern = array();
      $pattern[0] = '/[Á|Â|À|Å|Ä]/';
      $pattern[1] = '/[É|Ê|È]/';
      $pattern[2] = '/[Í|Î|Ì|Ï]/';
      $pattern[3] = '/[Ó|Ô|Ò|Ö]/';
      $pattern[4] = '/[Ú|Û|Ù|Ü]/';
      $pattern[5] = '/[á|â|à|å|ä]/';
      $pattern[6] = '/[ð|é|ê|è|ë]/';
      $pattern[7] = '/[í|î|ì|ï]/';
      $pattern[8] = '/[ó|ô|ò|ø|õ|ö]/';
      $pattern[9] = '/[ú|û|ù|ü]/';
      $replacement = array();
      $replacement[0] = 'A';
      $replacement[1] = 'E';
      $replacement[2] = 'I';
      $replacement[3] = 'O';
      $replacement[4] = 'U';
      $replacement[5] = 'a';
      $replacement[6] = 'e';
      $replacement[7] = 'i';
      $replacement[8] = 'o';
      $replacement[9] = 'u';
      return preg_replace($pattern, $replacement, $str);
    }
    $txt = $_POST['your_htmled_text'];
    //Convert to your system's charset. I checked this on the php.ini
    $txt = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $txt);
    //Apply your function
    $txt = quit_accenture($txt);
    //output
    print_r($txt);

This worked for me, but I also think is the right way :)

JuanLigas
  • 59
  • 1
  • 2
0

Why do you want to remove accents? Is it possible that you just want to ignore them? If so, this answer has a Perl solution that demonstrates how to do that. Note that the Perl is in a foreign language. :)

Community
  • 1
  • 1
tchrist
  • 78,834
  • 30
  • 123
  • 180