I'm receiving e-mails via IMAP with PHP.
Before trying to INSERT each new incoming e-mail message into my database, I make a basic check so that the body text (both the plaintext and HTML versions, if both exist) are "valid UTF-8", and if not, I just drop it and skip processing it further. I do so with the following code, which I determined to be the right one after spending countless hours of my life searching online and trying things myself, for literally years:
function string_is_valid_UTF8($string)
{
if (!mb_check_encoding($string, 'UTF-8'))
return false;
else
return true;
}
Occasionally, this doesn't seem to matter, because an e-mail slips through to the PHP code which then INSERTs it into the PostgreSQL database table, and thus this happens:
pg_query_params(): Query failed: ERROR: invalid byte sequence for encoding "UTF8": 0xa0:
No matter what checks I make beforehand, some always slip through, logging that error. Again and again...
What exactly do I need to do to make sure it never happens?! What is wrong about the code I have? Why does PHP say it's valid UTF-8 but PostgreSQL doesn't?! How is that even possible?
The latest e-mail, which prompted me to again try to ask about this, was some garbled spam letter which only had a HTML part. It contains messed-up UTF-8 somewhere. Of course, it doesn't matter what it contains, or what parsed it out like that. What matters is that PHP sees it as "OK" and PG sees it as "wrong", and so that damn error is logged as a result instead of the whole e-mail silently being ignored, as I desire.
What am I doing wrong? This has been torturing me for a very long time now and I need to get it resolved once and for all!