2

I'm using PayPal IPN and inserting the IPN data message into our database. I noticed that it's a partial object. I'm assuming that the serialize is failing and not the insert. No error is reported from the DB or from the server.

For example, this is a partial serialization. First part is left off by me:

... s:1:"4";s:12:"address_city";s:23:"COACALCO DE BERRIOZ

It's stopping directly after BERRIOZ. No closing quote, etc.

That value is address_city=COACALCO DE BERRIOZÁBAL. So it stopped at the accent character.

The character encoding is UTF-8. I verified encoding with:

echo mb_internal_encoding();

And it reports UTF-8. I also ensure that the mysqli charset is UTF-8 with:

mysqli_set_charset($connect, "utf8");

Like I reported, no errors from the DB or via error_handler? The IPN object is serialized fine with non-accent values. I discovered the issue when trying to view the record and unserialize() reported a problem.

hanji
  • 315
  • 2
  • 20
  • 1
    Presumably `mb_check_encoding()` on the PayPal response itself (from cURL I assume) returns true? You've not got a situation where your application is all UTF-8, the database is UTF-8, but the response from PayPal isn't? – CD001 Mar 08 '18 at 16:50
  • Hmmm. What would be the best way to force UTF-8 from the POST response coming from PayPal? – hanji Mar 08 '18 at 17:01
  • How are you inserting the data? Are you properly escaping (with prepared statements)? I've personally not had `serialize` fail on any content in strings... regardless if its utf8, or iso, or even binary. – IncredibleHat Mar 08 '18 at 17:06
  • for some reason, PHP isn't reading your multibyte strings correctly. It's stating that the length is `23`, but that length should be `24` `s:24:"COACALCO DE BERRIOZÁBAL"` Whichever script is responsible for the serialization, the encoding is not UTF8. – Napoli Mar 08 '18 at 17:29
  • Also, this wouldn't happen on a DB insert, you'd just get a weird character like �, etc. in its place, this is happening during `serialize()` – Napoli Mar 08 '18 at 17:36
  • I agree.. this is happening with serialize() and not with the DB. Whats the best process for ensuring UTF-8 after post and before serialize call? – hanji Mar 08 '18 at 20:45

1 Answers1

0

This smells like the "truncation" problem in MySQL's utf8/utf8mb4. See "truncated" in Trouble with UTF-8 characters; what I see is not what I stored

Probably Á is not encoded in utf8 (hex C381), but instead in latin1 (hex C1).

Plan A: Have the client use utf8 instead of latin1.

Plan B: Declare that the client is using latin1 by saying

mysqli_set_charset($connect, "latin1");
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Hmmm.. this is interesting. I looked at the DB table, and the collation is latin1_swedish_ci.. odd. Do you think setting it to utf8_general_ci will fix this? I am setting the charset in the code via mysqli_set_charset($connect, "utf8"); – hanji Mar 09 '18 at 06:31
  • @hanji - That command announces the encoding in the _client_. It _must_ match what the client has. The table does _not_ need to match it; a conversion will be done on the fly during `INSERT` and `SELECT`. – Rick James Mar 09 '18 at 13:55
  • @hanji `mysqli_set_charset` sets the characterset on the *connector* the table itself [*should* also use a utf8 collation](https://stackoverflow.com/questions/279170/utf-8-all-the-way-through/279279) (e.g. `utf8_general_ci`) - yes. n.b. MySQL defaults to Swedish because Mony Widenius is a Finn ... errr - yeah :) – CD001 Mar 09 '18 at 14:04