1

I have a question. I just recently switched from iso-8859-1 to utf-8, both in my SQL-database and generally throughout all my PHP-files. All my PHP-script does is GET whatever was put in a form (X), and search for it in the SQL-database, and present the data, while also displaying the message "X returned Y results."

Now I have a question to ask regarding the use of mb_check_encoding. I read the following in this thread:

Unfortunately, you should verify every received string as being valid UTF-8 before you try to store it or use it anywhere. PHP's mb_check_encoding() does the trick, but you have to use it religiously. There's really no way around this, as malicious clients can submit data in whatever encoding they want, and I haven't found a trick to get PHP to do this for you reliably.

As you can tell, I'm quite worried. I have done the following:

  • Switched my SQL-database to utf8mb4.
  • Used $mysqli->set_charset('utf8mb4'); for the connection between the database and the PHP-file.
  • Set my charset in my HTML/PHP-file through <meta http-equiv="content-type" content="text/html; charset=utf-8" />.
  • Saved all my files in UTF-8 (no BOM).
  • Used htmlspecialchars($_GET['name'], ENT_COMPAT | ENT_HTML401, 'UTF-8') for the "X returned Y results."-message.

My question is this: Should I still use the mb_check_encoding, even if I have done all of the above? And how would I check if I'm vulnerable to this "malicious" attack?

Community
  • 1
  • 1
AnonymousJ
  • 87
  • 1
  • 12
  • @hanshenrik According to the PHP-manual in regard to htmlspecialchars, http://php.net/manual/en/function.htmlspecialchars.php, the ENT_COMPAT is just a flag that converts double-quotes. I don't see the harm or the wrong in using it? All I'm using it for is display a text-message to the user specifying that they searched for X and received Y number of results. – AnonymousJ Mar 23 '15 at 21:36
  • oh sorry, i was thinking about double encode, not ENT_COMPAT, derp. nevermind – hanshenrik Mar 24 '15 at 12:29
  • @hanshenrik Do you think it would have any negative effect, if I changed htmspecialchars to ENT_QUOTES? – AnonymousJ Mar 24 '15 at 20:56

1 Answers1

2

The word "attack" sounds alarming, but in reality we are talking about "giving X to someone that expects Y and waiting to see what happens". It's far from a given that something bad will actually happen.

In this case MySql has the exact same worry as you do: what if the client sends input that does not conform to the agreed encoding? MySql is not the old roommate's weekend project, it clearly has to step up and deal with the problem in a sane manner. And indeed it does: it emits error code 1366 "incorrect string value" when you feed it this kind of input.

In conclusion: as long as you follow established best practices (prepared statements with parameters) to prevent SQL injection attacks, there is probably no real attack vector here. The worst thing that can happen is that the attacker will cause one of your SQL queries to fail; in the reasonable scenario where this failure will not cause a cascade of tragedy due to bad defaults and zero error handling, this means they will just earn themselves an error message. Of course MySql being "immune" to this doesn't mean that your application as a whole will also be immune, but it does mean that you don't have to worry about the database component.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • I have best to my ability tried to make use of prepared statements with parameters. That means I don't have to make use of the `mb_check_encoding` function, as there's no way there will be an SQL-injection, and since all my PHP-script is doing is GET, put it through prepared statements, and then output the received data? Great. Should I be doing anything else regarding UTF-8 "attacks"? If not, I thank you for your answer! – AnonymousJ Mar 23 '15 at 21:18
  • @AnonymousJ: You are most likely fine, but especially for security the devil is in the details. I cannot say that your chance of being exploited is exactly zero from a general description, you understand. – Jon Apr 14 '15 at 19:17