1

I am trying to query a comments table from mysql database by language. Whenever I query by language to fetch chinese comments it displays encoded gibberish characters. but whenever I use python to query, it works.

Cloud Platform: Google Cloud SQL Database location: Google Cloud SQL Programming Language: Nodejs

Below is my code

// Require process, so we can mock environment variables
  const process = require('process');
  const Knex = require('knex');
  const express = require('express');
  const app = express();

  const config = {
     user: process.env.SQL_USER,
     password: process.env.SQL_PASSWORD,
     database: process.env.SQL_DATABASE,
     socketPath: `/cloudsql/${process.env.INSTANCE_CONNECTION_NAME}`
   };


    var knex = Knex({
       client: 'mysql',
       connection: config
    });



  app.get('/', (req, res) => {
      knex.select('post')
       .from('comment')
         .where({
             'language': 'zh'
          }).limit(1).then((rows) => {
               res.send(rows);
          }).catch((err) => {
            res.send(err);
           });
     });

This is my query result:

"post": "æœ€ç™½ç—´çš„éƒ¨é•¿ï¼æœ€åŸºæœ¬çš„常识和逻辑都没有。真丢人ï¼"

please help.....

Isaac
  • 296
  • 1
  • 4
  • 14

1 Answers1

2

The text "æœ€ç™½ç—´çš„éƒ¨é•¿ï¼æœ€åŸºæœ¬çš„常识和逻辑都没有。真丢人ï¼" is what you get if "最白痴的部长基本的常识和逻辑都没有。真丢人" is sent encoded as UTF-8, but is then read and decoded as windows-1252 character set.

There are several different places this mis-decoding could happen:

  1. From the client to the application writing to the database when the data was first added
  2. Between the application and MySQL when adding the data
  3. Across a configuration change in MySQL that wasn't applied correctly.
  4. Between MySQL and the application reading the data.
  5. Between the application and the end client displaying the data to you.

To investigate, I suggest being systematic. Start by accessing the data using other tools, e.g. PHPMyAdmin or the mysql command line in Cloud Shell. If you see the right data, you know the issue is (4) or (5). If the database definitly has the wrong data in it, then it's (1), (2) or (3).

The most common place for this error to happen is (5), so I'll go into that a bit more. This is because often websites set the character set to something wrong, or not at all. To fix this, we must make the character set explicit. You can do this in express.js by adding:

res.set('Content-Type', 'text/plain; charset=utf-8')
David
  • 9,288
  • 1
  • 20
  • 52