4

Possible Duplicate:
Regular Expression To Anglicize String Characters?

What would be the best way to convert foreign language characters to english ones? For example ü to u.

Community
  • 1
  • 1
el_pup_le
  • 11,711
  • 26
  • 85
  • 142
  • Also related: http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net – Chris Fulstow May 21 '11 at 05:56
  • And: http://stackoverflow.com/questions/3769457/how-can-i-remove-accents-on-a-string – Chris Fulstow May 21 '11 at 05:56
  • 1
    if its the right character in the context it should be kept, not 'converted' –  May 21 '11 at 05:58
  • @Chris Fulstow - Considering the solution is PHP, not .Net, the 2nd two questions aren't relevant. And considering that a regex is *not* the right way to do this... ;) – John Green May 21 '11 at 06:06
  • @John: But the method described within particularly the second link _is_ the right one in general – decompose into base Latin characters and diacriticals, then strip the diacriticals. Anything else has tough edge cases and requires a very large list of things to change. – Donal Fellows May 21 '11 at 07:13
  • @Donal - Possibly... except that PHP doesn't have a string.Normalize. You'd need to utilize the full mapping table. While this may be the 'right' answer, such a table is large and unwieldy. I think it depends on the author's intent... to which I have no insight. Generally, I am not fan of doing this at all, but know that there are certain circumstances where it is needed and appropriate. The solution I propose below is a 'quick' solution, which I would recommend for things like URL replacement or filename setting... and I would not suggest anybody do this for most any other reason. – John Green May 21 '11 at 07:46

2 Answers2

2

There are only a couple of reasons to do this (url friendliness, mostly). You want strtr.

It basically works like this:

$addr = strtr($addr, "äåö", "aao");

The 2nd comment in the manual has a nice translation table for you.

John Green
  • 13,241
  • 3
  • 29
  • 51
0
 $text = mb_str_replace('ü','u', $text);

To find all non English character using:

 preg_match('#[^a-z0-9\-\.\,\:\;]#', $text, $characters);
Danzan
  • 958
  • 5
  • 8