2

I'm working on a way to solve the problem with special characters in an automated script for creating accounts in PHP. Since special characters are unwanted in email addresses and other places I'm trying to get rid of them, but I can't remove them before feeding them to the script since the users name has to be displayed properly to other users.

Example: Jörgen Götz should get the email address jorgen.gotz@domain.com but in the user database his first name should still be Jörgen and his last name Götz. I hope I'm not to unclear about what I want to achieve.

I've been experimenting with iconv() but I'm having some trouble with it. See code below.

$utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und Götz';

setlocale(LC_ALL, 'en_GB');

echo $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

The code above should return

Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz

but instead it gives me

Weiss, Goldmann, G"obel, Weiss, G"othe, Goethe und G"otz

I can't understand what the quotations are doing there.

Both Chrome and IE gives me the same result and the page is using charset="utf-8".

Before using iconv() I tried using strtr() together with an array of "unwanted" characters but I don't like the sollution of having to set an array of special characters everytime I need to convert strings back and forth.

Can anyone offer an explanation or sollution?

Community
  • 1
  • 1
user1571510
  • 113
  • 2
  • 8
  • That's what `ASCII//TRANSLIT` does. Try a different target encoding. – tripleee Aug 19 '12 at 17:13
  • Not according to the user providing the example at [php.net](http://www.php.net/manual/en/function.iconv.php). He claims that the transliteration output changes depending on the current language settings. – user1571510 Aug 27 '12 at 20:34

2 Answers2

0

Try adding this to your system (terminal in Ubuntu):

sudo locale-gen de_DE.UTF-8

Then changing the locale your php script:

setlocale(LC_ALL, 'de_DE.UTF-8');

Edit (Windows setup)

In Windows Server, you have to install the German Language Pack and change above to:

setlocale(LC_ALL, 'germany');
Niloct
  • 9,491
  • 3
  • 44
  • 57
  • I forgot to mention that I'm using Apache with Windows Server 2003. I tried the second alternative but it returned the same result. – user1571510 Aug 12 '12 at 14:24
  • Well, more work to do then (it works in above configuration). – Niloct Aug 12 '12 at 15:12
  • `setlocale(LC_ALL, 'germany');` (got that here: http://msdn.microsoft.com/en-us/library/cdax410z(v=VS.71).aspx). – Niloct Aug 12 '12 at 15:18
  • That didn't work either. setlocale() doesn't seem to affect the output; I've tried removing it. I can't understand why it isn't working. I doesn't seem like anyone else is having this problem and I can't find any tutorials or documents that explains this error either. – user1571510 Aug 12 '12 at 17:17
  • You should `var_dump` the return value of `setlocale` too. – Niloct Aug 13 '12 at 15:20
  • It seems the languages installed on my server are Swedish and English. The var_dump(setlocale(LC_ALL, 'en_GB')) returns bool(false). Does this help you? – user1571510 Aug 18 '12 at 15:56
  • I experimented with setlocale() and found out that "german", "english", "swedish" and such works and makes var_dump() return values, but I still get the same result from iconv(). – user1571510 Aug 18 '12 at 16:09
  • If it returns `bool(false)`, then you need to install the German Language Pack. – Niloct Aug 18 '12 at 21:45
  • I found out that I had a variety of languages installed and I have changed between a couple of them. So far, no luck. – user1571510 Aug 19 '12 at 05:33
  • This is getting complicated. I think my comments were clear but it seems not. Let me ask again: What does `var_dump(setlocale(LC_ALL, 'germany'));` returns (copy and paste the code there, and the result here)? If it is `bool(false)`, **you don't have german language pack installed**. Your text seems like it is in German, correct ? If so, you **need to install** via **Windows Update** the **German Language Pack**. – Niloct Aug 19 '12 at 16:31
  • -1 I don't think this approach is helping the OP solve this particular problem. – tripleee Aug 19 '12 at 17:14
  • @Niloct: No you're not being unclear. `var_dump(setlocale(LC_ALL, 'germany'));` returns `bool(false)` but I've looked at the languages installed in the servers regional settings and they're all there. And `var_dump()` doesn't work with any other country for that matter either, so that would mean that I don't have a language installed on my Windows 2003 machine which isn't the case. And I've tried removing the **german** letters and trying with **swedish** letters instead. I appreciate all the guidance but `ö results in "o` and `é in 'e` which doesn't feel like it is related to setlocale(). – user1571510 Aug 21 '12 at 23:24
  • Ok, so we are progressing. Can you please list the languages installed ? – Niloct Aug 21 '12 at 23:32
  • @Niloct: Okay. I sent you a screencap of the list of languages to your email. It starts with African, Albanian, Azeri followed by a range of languages and ends with Welsh, Xhosa, Zulu. – user1571510 Aug 22 '12 at 00:19
0

TRANSLIT tries to find characters that look similar to the requested character. Since the letter ö is not in ascii it is changing it to a pair for the umlat and the basic letter.

David
  • 562
  • 1
  • 3
  • 10