0

I have an old mysql database (mysql 5.0.2) in latin1 and I want to get data from it. For non-ascii characters I'm getting always the same output (eg., Â, À and Á are being presented as something like 'ef bf bd' in hex), that's to say different chars being presented the same way.

I just need to get these chars differently so I can map each one to the right corresponding utf-8 char.

I've already been trying to change the charset but it's not working for me!

Would someone please help me to get some data making sense?

var mysql = require('mysql')

var con = mysql.createConnection({
    host: "localhost",
    user: "root",
    //charset: "utf8mb4",
    //charset: "utf8",
    charset: "latin1",
    database : 'my_db'
})

con.connect()

var query = con.query("SELECT data from my_table where id='07'", function 
(error, results, fields) {
    var b = Buffer.from (results[0].data)
    console.log ('Retrieved data in hex -> ', b)
})

con.end()

When I go to the db and update the data to some ascii-only string, I can get the data in js without any problem, but when I replace that data to something like 'á' or 'à', I get always 'ef bf bd' in hex (-17 -65 -67 in decimal).

Maf
  • 696
  • 1
  • 8
  • 23
  • Have you tried, buffer.toString('utf-8') ? – FrV Jul 31 '19 at 11:22
  • Please, do you mean b.toString('utf-8') ? – Maf Jul 31 '19 at 11:26
  • Cause my buffer variable is named b. – Maf Jul 31 '19 at 11:27
  • MySQL should be converting the encoding from how the data is stored in its columns to the connection encoding (I assume that's `charset: '...'`) on the fly. So if you set that connection encoding to UTF-8, your Javascript client should receive UTF-8 encoded text, even if that's stored as Latin1 in the database. – deceze Jul 31 '19 at 11:29
  • But it's not happening. I forget to mention that it's an old version of mySQL (4.0). – Maf Jul 31 '19 at 11:33
  • If you really get latin1 encoding, try `const b = Buffer.from(results[0].data, 'latin1').toString('utf8');` – FrV Jul 31 '19 at 11:34
  • The problem is that `Buffer.from(results[0].data, 'latin1')` is always presenting `fd` in hex (`-3` in decimal), no matter what data is. So `toString('utf8')` outputs always the same trash. – Maf Jul 31 '19 at 11:43
  • Sorry, actually it's not mysql 4.0, but mysql 5.0.2 – Maf Aug 01 '19 at 10:07
  • @MoisésFerreira - The "trash" could be informative; let's see it. Is there no way to get hex without specifying "latin1"? – Rick James Aug 01 '19 at 14:58
  • Sorry, I wasn't quite sure about the version, but I'm now. I'm accessing the db in a remote server. I could see now in mySQL Front the following: running MySQL-Version 4.0.16-nt. So it's the right version. – Maf Aug 02 '19 at 08:36
  • About your last question, I cannot get different bytes in hex (even in decimal) specifying or not latin1. In PHP I could convert that "trash" to some informative utf8 chars, but I could not do this in nodejs. – Maf Aug 02 '19 at 08:39
  • @RickJames, in PHP the following line provides bytes that make sense: `iconv('CP1256', 'UTF-8', $result["data"])`. I would like `iconv from require('iconv')` to do the same in nodejs. – Maf Aug 02 '19 at 08:41
  • Is windows Arabic, but CP1252 (Windows-1252) presents all chars that I need, working in PHP! – Maf Aug 02 '19 at 10:20
  • @MoisésFerreira - `SELECT @@version;` `SELECT HEX(col) FROM ...` – Rick James Aug 02 '19 at 19:23
  • Which version supports select hex (col) from ??? – Maf Aug 05 '19 at 09:12

1 Answers1

1

Latin1 hex EF BF BD translates to �. Perhaps you are referring to a BOM (Byte-Order-Mark), which is EF BB BF, which looks like . See http://mysql.rjweb.org/doc.php/charcoll#bom_byte_order_mark .

Â, À and Á often show up when improperly mixing latin1 and utf8.

For further discussion, please provide a snippet of text in hex and characters. And/or see Trouble with UTF-8 characters; what I see is not what I stored for troubleshooting common problems.

OH! MySQL 4.0 had no concept of character sets. You could store and fetch strings without anything being tested or anything happening to them. However, things like LENGTH() provided byte counts, not character counts. And inequality string comparisons (col_a < col_b) would not provide reasonable results for non-ascii characters.

It might be safe to do nothing in PHP with your characters. Just feed them in and out of VARCHAR/TEXT (or BLOB) columns.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Let me try all you're saying. Sorry, actually it's not mysql 4.0, but mysql 5.0. Let me update this info. – Maf Aug 01 '19 at 10:05
  • @MoisésFerreira - Character sets and Collations were added in 4.1, so your 5.0 should understand most of what is exists in current versions. – Rick James Aug 01 '19 at 14:57
  • Sorry, I wasn't quite sure about the version, but I'm now. I'm accessing the db in a remote server. I could see now in mySQL Front the following: running MySQL-Version 4.0.16-nt. So it's the right version. I'm 100% sure the db does not support utf encoding. – Maf Aug 02 '19 at 08:35