I have a $string in PHP. It doesn't matter where this comes from (it's from incoming e-mails); the important part is that sometimes, it's not valid UTF-8 according to PostgreSQL, but is valid according to PHP.
I explicitly set both mb_internal_encoding('UTF-8') and mb_regex_encoding('UTF-8'). I explicitly set client_encoding to 'UTF8' (yes, it wants it without the '-') when making the PostgreSQL database connection. I have verified over and over again that the PG database itself uses UTF8. In short: everything on my system uses UTF-8 encoding.
Details: PHP 7.4.1. PG 11.5. Windows 10. (The same thing has happened for years and years for many versions of PHP/PG/Windows.)
Before trying to INSERT a record containing $string, I make the following integrity/safety check to avoid errors:
function string_is_valid_UTF8($string)
{
if (!mb_check_encoding($string, 'UTF-8'))
return false;
else
return true;
}
if (string_is_valid_UTF8($string))
// Proceed to INSERT it into the database since PHP says it's valid UTF-8 data.
Occasionally -- NOT every time! -- PostgreSQL barks at this, even though it has been checked by PHP to be valid UTF-8. It spits out/logs this error:
pg_query_params(): Query failed: ERROR: invalid byte sequence for encoding "UTF8"
I don't get it. The only explanation I can see is that PostgreSQL and PHP have different ideas of what is valid UTF-8. This problem has bugged me for years and I just never seem to get it resolved. Again and again, sometimes with weeks or months in between, some external data coming into my system causes this issue. In spite of my check!
Is there something I can tell PostgreSQL to make it handle this differently? I don't want that error to be logged. It's really, really annoying.
At this point, I'm utterly baffled as to how this can happen. Is PHP or PostgreSQL at wrong? Considering how many times I've dealt with this and trying to solve it by a zillion different methods, it doesn't seem reasonable that it's me doing something wrong at this point.