I want to transform a string with special characters into a sanitized string. I am using the following code for it.
function sanitize($str) {
$value = strtolower(trim($str));
$find = ["?", "[", "]", "/", "\\", "=", "<", ">", ":", ";", ",", "'", "\"", "&", "$", "#", "*", "(", ")", "|", "~", "`", "!", "{", "}", "%", "+", "“", "„", " ", chr(0)];
$value = str_replace($find, '-', $value);
$find = ['ä', 'ö', 'ü', 'ß', 'Ä', 'Ö', 'Ü'];
$replace = ['ae', 'oe', 'ue', 'ss', 'Ae', 'Oe', 'Ue'];
return str_replace($find, $replace, $value);
}
This works good for most kind of string but not that containing german umlauts. Two examples:
- Sinas sagt: „Wenn jemand es wagt, dann bin ich es“ -> sinas-sagt---wenn-jemand-es-wagt--dann-bin-ich-es-
- Maier warnt Müller vor „Harter Debatte“ -> maier-warnt-mÒ╝ller-vor--harter-debatte-
If I show the encoding of the input string with mb_detect_encoding
I get UTF-8 in both cases. What do I need to change to replace the umlauts?
[UPDATE]
I have done some more investigation (Thanks for the hints in comments) and my previous example where copied from windows console, so I think that might be an encoding problem of the output channel.
But I have still one problem with the umlauts in my website. If I include the output in my website I get: maier-warnt-m%EF%BF%BDller-vor--harter-debatte-
which encodes to: http://psa-portal.test/news/2017/02/11/maier-warnt-m�ller-vor--harter-debatte-