How to turn to US characters

Question

Possible Duplicate:
PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string

I need to change strings that have accents E.G: Casá to become Casa. Is there an easy way to do it with PHP? Thanks

And another good answer here: http://stackoverflow.com/questions/1890854/how-to-replace-special-characters-with-the-ones-theyre-based-on-in-php — Pekka, Jan 04 '11 at 22:49

score 5 · Answer 1 · answered Jan 04 '11 at 22:48

5

$text = iconv('UTF-8', 'US-ASCII//TRANSLIT', $text);

answered Jan 04 '11 at 22:48

ceejayoz

176,543
40
303
368

This is really the best way to do it. Lots of hackish ways out there, make sure to try this first. – mfonda Jan 04 '11 at 22:52
1

This works quite well. See here: http://ideone.com/2ZPVE – treeface Jan 04 '11 at 22:54

score 1 · Accepted Answer · answered Jan 04 '11 at 22:55

if iconv doesn't work well for your purposes, strtr will replace characters with replacement characters that you assign.

http://php.net/manual/en/function.strtr.php

<?php
//In this form, strtr() does byte-by-byte translation
//Therefore, we are assuming a single-byte encoding here:
$addr = strtr($addr, "äåö", "aao");
?>

some user example...

$GLOBALS['normalizeChars'] = array(
    'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 
    'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 
    'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 
    'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 
    'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 
    'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 
    'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'
);

return strtr($toClean, $GLOBALS['normalizeChars']);

`'ß'=>'Ss'` Doesn't make sense to me. It should be `'ß'=>'ss'`. — treeface, Jan 04 '11 at 22:58
that is some user example from the php documentation for the function. i didn't write it. — dqhendricks, Jan 04 '11 at 22:59
don't get me wrong, I gave you your upvote, so I like your post. It's an alternative solution to the problem which is worth something at least. Another thing I should mention (for posterity), is that `'Æ'=>'A'` should be `'Æ'=>'AE'` and `'æ'=>'a'` should be `'æ'=>'ae'`. http://en.wikipedia.org/wiki/%C3%86 — treeface, Jan 05 '11 at 00:49

score 0 · Answer 3 · answered Jan 04 '11 at 22:48

0

"á" and "a" are different letters; one is not simply "a with an accent". You need to define the mapping that you want between characters. Then perhaps make an array of suitable character replacements, and apply them in a loop over the input string.

answered Jan 04 '11 at 22:48

Lightness Races in Orbit

378,754
76
643
1,055

greg0ire · Answer 4 · 2011-01-04T22:59:38.393

-2

you can use htmlentities() and preg_replace() (to select the second character). Example:

echo preg_replace(
'/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|ring);/',
'$1',
htmlentities("Hélicoïdal"));

edited Jan 04 '11 at 22:59

answered Jan 04 '11 at 22:50

greg0ire

22,714
16
72
101

`htmlentities()` for normalizing accented characters into their base versions? – Pekka Jan 04 '11 at 22:50
1

This is a horrible idea. – Peter Bailey Jan 04 '11 at 22:52
@Pekka: yes, see example – greg0ire Jan 04 '11 at 23:02
Ohh! Interesting. This isn't bad at all. – Pekka Jan 04 '11 at 23:02
@Pekka: I vaguely remembered it at first, that's why it was so unclear. I'm trying to find where I read this. – greg0ire Jan 04 '11 at 23:07
@greg0ire yeah. But there might be Unicode Umlaut characters that don't translate into a word entity, but a number - I find it interesting that HTML entities have this kind of regularity to them, but the `iconv()` solution is probably more reliable – Pekka Jan 04 '11 at 23:10
@Pekka: I agree, but since I found this solution interesting too (and I didn't know the `iconv()` solution), I decided to post it. I don't understand all these downvotes. – greg0ire Jan 04 '11 at 23:16
1

I downvoted you and I'll tell you why: it's a hack and nowhere near a complete solution. Characters like Ð and Ž fail, and certainly every ligature would fail. And they will fail in the worst way: silently. No error would be generated from your algorithm missing these characters, and then you'd end up with crap like `Æ` and `Ð` in your output and never know it w/o manual inspection. Not to mention false positives - what if the input is already HTML with legitimate entities that are to be preserved? – Peter Bailey Jan 04 '11 at 23:34
@Peter Bailey: this is much more interesting than your previous comment, thanks. – greg0ire Jan 04 '11 at 23:44
Yeah, sorry about that. Too many years of doing this have made me crusty and weary of "clever" solutions, because they are *very rarely* "good" solutions. – Peter Bailey Jan 04 '11 at 23:46

How to turn to US characters

4 Answers4