There's no built-in for this because you only have this problem if you're doing other, more important things incorrectly and this just papers over them.
See: UTF-8 all the way through
But if you're committed to not actually fixing that and making your application more difficult to maintain, you could use the following to encode UTF-8 codepoints above 127 as HTML entities:
function force_utf8_entities($input) {
return implode('', array_map(
function($a){
if( strlen($a) > 1 ) {
return sprintf("&#x%X;", mb_ord($a));
}
return $a;
},
mb_str_split($input)
));
}
$input = "Hügelkultur";
var_dump(
force_utf8_entities($input)
);
It's also worth noting that there's no such thing as "non-lower ASCII", as every byte with an ordinal representation above 127 is entirely at the mercy of the declared encoding. UTF, ISO8859-X, and MS cpXXXX encodings will all hotly disagree about what those bytes represent on the screen.
This is where the term "7-bit safe" comes from, because no matter how badly you muck up your encodings in transit, you can be reasonably sure that bytes below 127 will make it through.
edit
"Extended ASCII" still is not a thing.
If you display a byte above 127 the symbol presented on the screen will be different depending on the encoding it is interpreted as. People with western-european alphabets are somewhat coddled because our funny accented letters tend to be the defaults [ISO8859-1 and cp1252] but when you switch to eastern-european charsets [ISO8859-5 and cp1251] you're going to see ќ
instead of ü
.
It's worth noting that the FC
in ü
is not a byte value, it is the un-encoded UTF code point. Again, users of western-european alphabets are spoiled, and frequently confused, by the overlap in the code point space. uFC
encoded as UTF-8 is the literal two-byte sequence C3 BC
. hence your urlencode()
output.
Really, the truth is that there's not such thing as "ASCII" at all. It's just that most non-asian encodings tend to agree that it's easier to just leave the traditional first 127 bytes the same everywhere so as not to freak out the english people.