0

Possible Duplicate:
PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string

I need to change strings that have accents E.G: Casá to become Casa. Is there an easy way to do it with PHP? Thanks

Community
  • 1
  • 1
lisovaccaro
  • 32,502
  • 98
  • 258
  • 410
  • And another good answer here: http://stackoverflow.com/questions/1890854/how-to-replace-special-characters-with-the-ones-theyre-based-on-in-php – Pekka Jan 04 '11 at 22:49

4 Answers4

5
$text = iconv('UTF-8', 'US-ASCII//TRANSLIT', $text); 
ceejayoz
  • 176,543
  • 40
  • 303
  • 368
1

if iconv doesn't work well for your purposes, strtr will replace characters with replacement characters that you assign.

http://php.net/manual/en/function.strtr.php

<?php
//In this form, strtr() does byte-by-byte translation
//Therefore, we are assuming a single-byte encoding here:
$addr = strtr($addr, "äåö", "aao");
?>

some user example...

$GLOBALS['normalizeChars'] = array(
    'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 
    'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 
    'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 
    'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 
    'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 
    'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 
    'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'
);

return strtr($toClean, $GLOBALS['normalizeChars']);
dqhendricks
  • 19,030
  • 11
  • 50
  • 83
  • 1
    `'ß'=>'Ss'` Doesn't make sense to me. It should be `'ß'=>'ss'`. – treeface Jan 04 '11 at 22:58
  • that is some user example from the php documentation for the function. i didn't write it. – dqhendricks Jan 04 '11 at 22:59
  • don't get me wrong, I gave you your upvote, so I like your post. It's an alternative solution to the problem which is worth something at least. Another thing I should mention (for posterity), is that `'Æ'=>'A'` should be `'Æ'=>'AE'` and `'æ'=>'a'` should be `'æ'=>'ae'`. http://en.wikipedia.org/wiki/%C3%86 – treeface Jan 05 '11 at 00:49
0

"á" and "a" are different letters; one is not simply "a with an accent". You need to define the mapping that you want between characters. Then perhaps make an array of suitable character replacements, and apply them in a loop over the input string.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
-2

you can use htmlentities() and preg_replace() (to select the second character). Example:

echo preg_replace(
'/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|ring);/',
'$1',
htmlentities("Hélicoïdal"));
greg0ire
  • 22,714
  • 16
  • 72
  • 101
  • `htmlentities()` for normalizing accented characters into their base versions? – Pekka Jan 04 '11 at 22:50
  • 1
    This is a horrible idea. – Peter Bailey Jan 04 '11 at 22:52
  • @Pekka: yes, see example – greg0ire Jan 04 '11 at 23:02
  • Ohh! Interesting. This isn't bad at all. – Pekka Jan 04 '11 at 23:02
  • @Pekka: I vaguely remembered it at first, that's why it was so unclear. I'm trying to find where I read this. – greg0ire Jan 04 '11 at 23:07
  • @greg0ire yeah. But there might be Unicode Umlaut characters that don't translate into a word entity, but a number - I find it interesting that HTML entities have this kind of regularity to them, but the `iconv()` solution is probably more reliable – Pekka Jan 04 '11 at 23:10
  • @Pekka: I agree, but since I found this solution interesting too (and I didn't know the `iconv()` solution), I decided to post it. I don't understand all these downvotes. – greg0ire Jan 04 '11 at 23:16
  • 1
    I downvoted you and I'll tell you why: it's a hack and nowhere near a complete solution. Characters like Ð and Ž fail, and certainly every ligature would fail. And they will fail in the worst way: silently. No error would be generated from your algorithm missing these characters, and then you'd end up with crap like `Æ` and `Ð` in your output and never know it w/o manual inspection. Not to mention false positives - what if the input is already HTML with legitimate entities that are to be preserved? – Peter Bailey Jan 04 '11 at 23:34
  • @Peter Bailey: this is much more interesting than your previous comment, thanks. – greg0ire Jan 04 '11 at 23:44
  • Yeah, sorry about that. Too many years of doing this have made me crusty and weary of "clever" solutions, because they are *very rarely* "good" solutions. – Peter Bailey Jan 04 '11 at 23:46