1

how can I convert incorrect Umlauts like:

 ä <- ä
 Ä <- Ä
 ö <- ö
 Ö <- Ö
 ü <- ü
 Ü <- Ãœ
 ß <- ß
…

Thats my current Code but it is not working:

echo iconv("ISO-8859-1", "UTF-8" "Ü");
Bharata
  • 13,509
  • 6
  • 36
  • 50
Maro
  • 55
  • 1
  • 7

2 Answers2

3

Try this. It outputs: äÄöÖüÜß

<?php

$inputs = [ 'ä', 'Ä', 'ö', 'Ö', 'ü', 'Ü', 'ß' ];

foreach ($inputs as $input)
{
    echo iconv('UTF-8', 'WINDOWS-1252//TRANSLIT', $input);
}
Ro Achterberg
  • 2,504
  • 2
  • 17
  • 17
2

Your mojibake is due to multiple mis-encodings between UTF8 and cp1252, which is Windows' awful version of 8859-1. If you apply the same mis-encoding in reverse you can un-corrupt your data in most cases if you're lucky.

$in = 'Ü'; // this is copied from StackOverflow where it is UTF8 encoded which 
            // may or may not match the actual encoding you pasted in.
$p1 = iconv('utf-8', 'cp1252', $in);
$p2 = iconv('utf-8', 'cp1252', $p1);

var_dump(
    $in, bin2hex($in),
    $p1, bin2hex($p1),
    $p2, bin2hex($p2)
);

Output:

string(4) "Ü"
string(8) "c383c593"
string(2) "Ü"
string(4) "c39c"
string(1) "�"
string(2) "dc"

And if you look up the cp1252 encoding table you'll see that 0xDC is Ü.

But honestly you should:

  1. Fix this broken data at the source.
  2. Just standardize on UTF8 if you can.

One or both of these will make your life easier.

Edit: Switched out mb_ for iconv for consistency with the question. The mb_ equivalent is:

$in = 'Ü';
$p1 = mb_convert_encoding($in, 'cp1252', 'utf-8');
$p2 = mb_convert_encoding($p1, 'cp1252', 'utf-8');
Sammitch
  • 30,782
  • 7
  • 50
  • 77