0

I want to transform a string with special characters into a sanitized string. I am using the following code for it.

function sanitize($str) {
    $value = strtolower(trim($str));

    $find = ["?", "[", "]", "/", "\\", "=", "<", ">", ":", ";", ",", "'", "\"", "&", "$", "#", "*", "(", ")", "|", "~", "`", "!", "{", "}", "%", "+", "“", "„", " ", chr(0)];
    $value = str_replace($find, '-', $value);

    $find = ['ä', 'ö', 'ü', 'ß', 'Ä', 'Ö', 'Ü'];
    $replace = ['ae', 'oe', 'ue', 'ss', 'Ae', 'Oe', 'Ue'];

    return str_replace($find, $replace, $value);
}

This works good for most kind of string but not that containing german umlauts. Two examples:

  1. Sinas sagt: „Wenn jemand es wagt, dann bin ich es“ -> sinas-sagt---wenn-jemand-es-wagt--dann-bin-ich-es-
  2. Maier warnt Müller vor „Harter Debatte“ -> maier-warnt-mÒ╝ller-vor--harter-debatte-

If I show the encoding of the input string with mb_detect_encoding I get UTF-8 in both cases. What do I need to change to replace the umlauts?

[UPDATE]

I have done some more investigation (Thanks for the hints in comments) and my previous example where copied from windows console, so I think that might be an encoding problem of the output channel.

But I have still one problem with the umlauts in my website. If I include the output in my website I get: maier-warnt-m%EF%BF%BDller-vor--harter-debatte- which encodes to: http://psa-portal.test/news/2017/02/11/maier-warnt-m�ller-vor--harter-debatte-

Georg Leber
  • 3,470
  • 5
  • 40
  • 63
  • http://stackoverflow.com/questions/158241/php-replace-umlauts-with-closest-7-bit-ascii-equivalent-in-an-utf-8-string – bxN5 Feb 12 '17 at 11:48
  • When using `iconv("utf-8","ascii//TRANSLIT",$str)` I get: Maier warnt M"uller vor "Harter Debatte". Why is **ü** is replaced with **"u**? – Georg Leber Feb 12 '17 at 12:04
  • Cannot confirm, see https://3v4l.org/PkVig. Maybe an output encoding issue. – Olaf Dietsche Feb 12 '17 at 12:06
  • "If I include the output in my website", which output? Maybe this question http://stackoverflow.com/q/25222973/1741542 helps with `%EF%BF%BD`. – Olaf Dietsche Feb 12 '17 at 13:42

0 Answers0