4

I don't understand this behavior and am hoping someone can enlighten me...

mysql> CREATE TABLE test (id INT AUTO_INCREMENT, data JSON, PRIMARY KEY(id));
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO test(data) VALUES ('["\\u0000\"]'), ('["\\u0001"]'), ('["\\u0081"]'), ('["\\u0091"]');
Query OK, 4 rows affected (0.09 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select *,char_length(data),hex(data) from test;
+----+------------+-------------------+----------------------+
| id | data       | char_length(data) | hex(data)            |
+----+------------+-------------------+----------------------+
|  1 | ["\u0000"] |                10 | 5B225C7530303030225D |
|  2 | ["\u0001"] |                10 | 5B225C7530303031225D |
|  3 | [""]      |                 5 | 5B22C281225D         |
|  4 | [""]      |                 5 | 5B22C291225D         |
+----+------------+-------------------+----------------------+
4 rows in set (0.00 sec)

Why does MySQL choose to parse \\u0081 as a code point, but leave \\u0001 as a series of simple characters?

Or taken the other way, why does MySQL parse the "\\" in the latter case as "It's a literal backslash character," but parse the "\\" in the former case as a reason to interpret the following characters? I can see arguments for either approach, but I'm confounded by the change in behavior between \u0001 and \u0081.

This is on "mysql Ver 14.14 Distrib 5.7.22, for Linux (x86_64) using EditLine wrapper" as well as "mysql Ver 8.0.12 for macos10.13 on x86_64 (MySQL Community Server - GPL)". It shows up on the MySQL command line, as shown here, as well as via PDO.

As always, my apologies if this question is addressed elsewhere. I found many related issues but none that addressed this inconsistency (or for Bug 87722, it claims to have been fixed, but doesn't seem to be).

Sammitch
  • 30,782
  • 7
  • 50
  • 77
  • What's the character encoding for `data`? This depends on your server settings. You can find out with `SHOW CREATE TABLE test` Anything above `\u0080` in UTF-8 is invalid unless properly constructed. A single byte in that range is always invalid. – tadman Feb 01 '19 at 04:41
  • I was looking for that command! Thanks. It says "DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci". But the weirdness here is that I'm just trying to put straight text in these fields: an actual backslash (0x5C), a lowercase u (0x75), some numerals (0x30-39). Which MySQL does correctly for backslash, u, zero, zero, zero (5C75303030); but it for some reason interprets differently for backslash, u, zero, zero, eight. – Scott McDermott Feb 01 '19 at 15:41

1 Answers1

2

Due to the two layers of escaping here, SQL and JSON, you actually need to double up on backslashes for it to work:

INSERT INTO test(data) VALUES ('["\\\\u0000\"]'), ('["\\\\u0001"]'), ('["\\\\u0081"]'), ('["\\\\u0091"]');

Note that this is not necessary if these are simple VARCHAR fields. JSON treats \ as a special character.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • Interesting! Indeed I had tried the four-backslashes as an experiment, and ended up with two when I read it back out, which was at least consistent but still wasn't what I was looking for (a single backslash in the recorded data). But you're saying that a single backslash in the recorded data simply can't be consistently created *because it's a JSON column*, and that JSON-colum-ness causes it to sometimes parse \uXXXX and sometimes not --- is that correct? – Scott McDermott Feb 01 '19 at 18:16
  • I admit the "sometimes" is still rankling, as I'm pretty sure \uXXXX should be parsed consistently in JSON (https://stackoverflow.com/questions/19176024/how-to-escape-special-characters-in-building-a-json-string). But if you can confirm that you mean a single backslash as a standalone character in a JSON column simply isn't a good thing to aim for, then I'll aim elsewhere. – Scott McDermott Feb 01 '19 at 18:16
  • 1
    It's not because it's a JSON column, it's because it's a JSON string value encoded within a SQL string value. When you `SELECT` it you only have one layer, the `mysql` interactive shell does not show escaping necessary to insert the value, just what the value (approximately) is. So inserting a VARCHAR: `\\u0000` but a JSON string: `'["\\\\u0000"]'`. – tadman Feb 01 '19 at 18:21