my app is handling delivery addresses of people's orders in a webshop / connected marketplace like ebay. I already accounted for UTF-8 encoding meaning it handles kyrillic, chinese etc characters correctly. However, from time to time I have entries with an unknown character � which already appears for example in the delivery address as viewed at ebay. So there's nothing going wrong along the way - the string is delivered like that.
Now at some point I am performing an address check against an official (german) address DB like so:
$query = "SELECT DISTINCT * FROM adrCheck WHERE zip='".$zip."' AND street='".$street." AND city='".$city."'";
In case there is at least one result, I know the address must be correct.
Anyhow, when those incorrect characters appear I get a SQL error MYSQLi Error (#1267): Illegal mix of collations (cp850_general_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
which I can react to.
BUT I want to be able to check beforehand and include only those parameters into the query which are correctly encoded.
I have tried
print_r(mb_detect_encoding("K�ln")); // gives me UTF-8
print_r(mb_check_encoding("K�ln", "UTF-8")); // gives me 1 / true
and the preg_match method which also tells me that it's valid UTF-8.
What am I overlooking? Any suggestions on how to handle this occasional snafu user input?