2

I've got a node.js script running that pulls data out of a public "database" (it's a 'blockchain') and then performs some operations on it and then inserts it into a MySQL database. I've got the MySQL database using UTF8_general_ci encoding. The vast majority of data parses fine, but every so often it hits something it can't insert. I get this error:

code: 'ER_TRUNCATED_WRONG_VALUE_FOR_FIELD',
  errno: 1366,
  sqlMessage: 'Incorrect string value: \'\\xF0\\x9F\\x8D\\x95 N...\' for column \'body\' at row 1',
  sqlState: 'HY000',
  index: 0,

It shows the bit of string that seems to be causing the error, and it always has this distinct question mark character:

This looks like Paradise for me! ����\

I'm guessing this is an encoding issue? Is there a way I can convert these before it throws an error? I'm not sure what encoding this blockchain uses, and I'm not even sure how I'd find out.

edit: here's what another example (of the error) shows on the web interface to this blockchain:

And your very welcome !

another edit: I should point out that I am using mysql.format(sql, inserts) to handle inadvertent sql problems with the data -https://github.com/mysqljs/mysql#preparing-queries

Rick James
  • 135,179
  • 13
  • 127
  • 222
erv
  • 593
  • 8
  • 27

2 Answers2

12

The likely answer is that the MySQL connection did not specify utf8mb4. (Specifically, MySQL's utf8 will not suffice for Emoji.) Can you provide the connection code? Here is a fallback: Execute this after connecting to MySQL:

SET NAMES utf8mb4

Another possible answer is that the web server is not treating the page as UTF-8. One way to solve this is with this in the <head>:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 

And the column needs to be CHARACTER SET utf8mb4, not ut8.

node.js connection pooling

var connection = mysql.createConnection({ ... , charset : 'utf8mb4'});

See stackoverflow

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • The connection is based on the examples in the mysql nodejs docs: var mysql = require('mysql'); var pool = mysql.createPool({ connectionLimit : 10, host : 'example.org', user : 'bob', password : 'secret', database : 'my_db' }); I'm not displaying the characters, so I don't need to worry about things on the webserver side. – erv Oct 16 '17 at 15:20
  • But you do have to worry about the "character set" -- The error is complaining that you are trying to stuff a 4-byte Emoji into a `CHARACTER SET utf8` hole in MySQL. That is limited to the 3-byte subset of `UTF-8`. – Rick James Oct 16 '17 at 16:54
  • Thanks -- "node.js" was the clue I needed. See what I added. – Rick James Oct 16 '17 at 16:59
  • I've made the change to the connection charset and am just running some data through now. No errors so far! – erv Oct 17 '17 at 02:44
1

I ended up solving this by using iconv-lite from npm.

So basically:

const iconv = require("iconv-lite");
var buf = iconv.encode("weird characters", "utf8");
erv
  • 593
  • 8
  • 27
  • This is making things worse. To demonstrate, let's see `SELECT HEX(...) FROM ...`. If you don't see 4-byte chunks beginning with `F0`, it is a mess. The character you first mentioned should show `F09F8D95`. If you get `C3B0C5B8C28DE280A2`, you have "double encoding". `` is hex `F09F918D F09F9295` – Rick James Oct 16 '17 at 15:03
  • It's literally stopped the error, so I don't see how it can be making things worse. – erv Oct 16 '17 at 15:06
  • Another possibility is that it converted to `U+1F32A` or `\u1F32A` type "Unicode" encoding. That is "worse" for anything other than a web page. Let's see the hex, so I am not guessing. – Rick James Oct 16 '17 at 15:12
  • The data looks fine in the database tables. It's not for displaying, only searching. – erv Oct 16 '17 at 15:25
  • Searching is even worse -- You must have _either_ `F09F918D` in both the search string and the data _or_ `\u1F32A` in both. Else, the tornado will not match. – Rick James Oct 16 '17 at 16:55
  • I'm only doing searches on english text. – erv Oct 17 '17 at 02:52