Is there a way to convert umlauts from the representations ae, Ae, oe, Oe, ue, Ue
and ss
, back to the original umlauts? Important is that the spelling is observed like "teuer"! For example, the term "teuer
" must not be changed in "teür
". Thanks!
-
6No, I rather don't think so. You would need a dictionary with all exceptions. – Sascha Galley Jul 21 '11 at 13:39
-
1You would need a dictionary of all the acceptable words to convert, and be sure not to convert partials (a la 'clbuttic'). – Andrew Jul 21 '11 at 13:40
-
"back to the original umlauts" - are you converting them in the first place? Sounds like you could just retain the information instead. – Dave Jul 21 '11 at 13:43
-
1My German is a bit rusty, but from memory there are well defined rules for most cases ([vowel][vowel]e doesn't change, [consonent][vowel]e does, [start of word][vowel]e does, etc), so something like (in regex) `/^[^aeiou]?[aeiou]e/` would match most cases. You'd still need a dictionary with specific exceptions, but the general case would pick up most. – Sysyphus Jul 21 '11 at 13:47
3 Answers
iconv("utf-8","ascii//TRANSLIT",$input);
Extended example
OR
echo strtr(utf8_decode($input),
utf8_decode('ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ'),
'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');
Refer this question.

- 1
- 1

- 97,193
- 102
- 206
- 364
I suggest you convert each permutation of occurences of "ue", "oe" and so on. By each permutation I mean if there are 3 occurences first replace only the first, then only the second, then only the third, then first and second and so on.
Next, check if the results are contained in a standard spellchecking dictionary. By this you do not have to create your own dictionary for exceptions.
A wordlist can be found for example on ftp://ftp.ox.ac.uk/pub/wordlists/german/words.german.Z

- 32,506
- 16
- 106
- 171
This is going to be pretty tricky to get right. There certainly isn't any built-in function to do it.
Most of the examples I've seen for this kind of thing work in the opposite direction (ie taking a string with accented characters and replacing them with their ASCII equivalents). Where I have seen it done, it's always been a case of providing a map of characters and their equivalents, and scanning the string doing replacements.
The PHP manual page for the strtr()
function has some good examples on the kind of thing you'd need to do, but your requirements to avoid specific exceptions is going to complicate the whole process enormously.

- 166,037
- 39
- 233
- 307