2

Dear nodejs experts and database gurus,

We face issues with storing emojis and other special characters in our MySQL database. We get an error by Prisma, which is the "ORM" we use:

Conversion from collation utf8_general_ci into utf8mb4_unicode_520_ci impossible for parameter

prisma:query INSERT INTO `app`.`show` [...]
PrismaClientUnknownRequestError: Error occurred during query execution:
ConnectorError(ConnectorError { user_facing_error: None, kind: QueryError(Server(ServerError {
   code: 3988,
   message: "Conversion from collation utf8_general_ci into utf8mb4_unicode_520_ci impossible for parameter",
   state: "HY000" })) })
at cb (/app/node_modules/@prisma/client/runtime/index.js:38695:17) { clientVersion: '3.8.1' }
prisma:query ROLLBACK

We tried the following solutions already:

  1. Using different collations on the database (db level, table level, field level): utf8mb4_unicode_520_ci, utf8mb4_unicode_ci, but the goal is to use a set which is most modern like utf8mb4_0900_ai_ci
  2. Changing it to utf8 instead of utf8mb4 resulted in an error like Incorrect string value: '\\xF0\\x9F\\x92\\x95' and is not our goal as we want to be able to store all emojis
  3. Dumping, changing all encoding statements in the sql dump and re-importing with parameters. We used a bunch of parameters in hope it will encode it differently like --default-character-set=utf8mb4 or --hex-blob
  4. Defining a charset in our backend as we believe that the database is not the issue, but more the input we send to it like the old PHP way we knew ini_set("default_charset", "UTF-8"), but as we understood Nodejs does not care about the encoding/charset, but maybe there is a way
  5. Defining a charset at the MySQL client, but we did not find a way to do so as we use prisma.io and maybe do not have access to the mysql client underhood. Prisma by itself seems to support utf8mb4 as this is in the docs https://www.prisma.io/docs/concepts/components/prisma-client/case-sensitivity#database-collation-and-case-sensitivity and we found a recent discussion about defining it on a more granular level here https://github.com/prisma/prisma/discussions/4743

Maybe two more valuable insights:

  • The data comes from a React Native app, but before the database migration, we had no issues, so we hope it is not a problem on the client-side, but for sure, we can specify something there as well, eg. setting HEADER information on the API call, which we do with axios
  • Before we used the db-as-a-service tool Airtable, which was quite a hassle to "break free" from them. We exported everything in CSV format and imported it in a local MySQL db powered with XAMPP (which had different options for encoding) and then imported the dump to our new MySQL cluster

Our goal is to

  • Set the right encoding character set in our new MySQL cluster
  • Send the data in the right format to the DB (setting the same set in NodeJS and React)

Hope you can help us! Thanks!

Florian
  • 81
  • 6

2 Answers2

1

\\xF0\\x9F\\x92\\x95 () needs the CHARACTER SET utf8mb4, not utf8. The COLLATION is irrelevant, other than that it must be consistent (that is, utf8mb4_...). (Note the 4 byte hex, and the 4 in utf8mb4.)

There are several places to specify the charset. It sounds like you missed a place.

I don't have any notes on Prisma or React, but for node.js:

var connection = mysql.createConnection({ ... , charset : 'utf8mb4'});

See Emoji are not inserting in database node js mysql

That's the connection; the table also needs it. Please provide SHOW CREATE TABLE so we can check that the column has the charset specified or, if not, then the table has the appropriate default.

Yes, the COLLATION utf8mb4_0900_ai_ci is probably best for MySQL 8.0. (It is not available in older versions of MySQL nor in any version of MariaDB.)

Note: Specifying just a COLLATION causes the CHARACTER SET to be set to the beginning part of the collation.

PHP's ini_set("default_charset", "UTF-8") should be fine, but is probably not relevant to the problem).

Please provide SHOW VARIABLES LIKE 'char%'; The _client, _connection and _results should all be utf8mb4.

Character set refers to 'encoding', as in what the bits represent. Collation deals with comparisons and sorting -- "A" < "a" or not, plus many other variants on that.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks Rick! I investigated a bit more and it seems to be a prisma.io issue. I am not able to pass the argument `?charset=UTF8MB4` to prisma/mysql client. I tried `SQL_PRISMA_URL='mysql://user:pass@host:port/db?charset=UTF8MB4&sslmode=require&sslcert=ssl.crt&sslaccept=accept_invalid_certs'` but it ignores the charset argument. This is the docs about how to pass arguments to the underlying mysql connection: https://www.prisma.io/docs/concepts/database-connectors/mysql#connection-details – Florian Feb 16 '22 at 08:49
  • `await prisma.$queryRaw(SHOW VARIABLES LIKE 'character_set_%')` still returns 'character_set_client': 'utf8mb3', 'character_set_connection': 'utf8mb3', 'character_set_database': 'utf8mb4', 'character_set_results': 'utf8mb3', 'character_set_server': 'utf8mb4', 'character_set_system': 'utf8mb3' – Florian Feb 16 '22 at 08:49
  • charset : 'utf8mb4' this part saves my day – mercury Jun 01 '22 at 20:16
1

I had the exact same issue. I could solve it for the current connection by running the query SET NAMES utf8mb4 - but this doesn't solve it for the connection bool Prisma uses. I'm currently experimenting with running SET NAMES utf8mb4 via init_connect which is applied to all new connections to mysql. It does seem to work but feels a bit like a workaround instead of a solution. Still not sure what causes prisma to use the "wrong" encoding.

PanMan
  • 1,305
  • 1
  • 10
  • 16
  • I did pretty much the same, added this command before start the server: `await prisma.$executeRaw`SET NAMES latin1;`` – Marco Arruda Apr 18 '23 at 02:41
  • 1
    @MarcoArruda you need to apply it to all connections, I think your command only applies it to the current connection. – PanMan Apr 19 '23 at 14:12