0

I'm getting a special characters from a latin1_swedish_ci database. It contains a huge amount of data, and migration is not a option :(. The new app has all its files encoding are utf8, and we are looking for conversion solution, from latin1 to uft8. The charset on mysql2, plus set names, etc.. I also try any other suggestions using iconv (version dependency) from internet that I could not make them work, So I ended up developing some code that seems works and fixes the problem.

However, it is very obvious...do you see something wrong in the code?

let data = JSON.stringify(rows); // list of mysql objects swedish encoding to string 
data = Buffer.from(data, "latin1"); // to bynary
data = data.toString("utf8"); // to utf8
rows = JSON.parse(data); // to json

String example before apply the code below:

Distributeurs: N° 5/6

Thanks!

MikeSouto
  • 265
  • 1
  • 7
  • [mre]? Please provide some data that is causing the error. BTW: Is a JSON not always UTF8 encoded? – Luuk Feb 18 '22 at 09:26
  • An example of a text without that parser:Distributeurs: N° 5/6/ utf8 is everywhere, file encoding, json request, react html headers, etc.. Thanks!! – MikeSouto Feb 18 '22 at 09:34
  • [This](https://stackoverflow.com/a/67159348/724039) answer shows how to convert from UTF8 to latin1, switching the parameters might work. – Luuk Feb 18 '22 at 09:49

1 Answers1

0

OK, (warning: my node skills are low), but this code will convert the word ångström (first from UTF8 to latin1 and then) from latin1 to UTF8:


const buffer = require('buffer');
const latin1Buffer = buffer.transcode(Buffer.from("ångström"), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");
let rows = latin1String;

console.log("Buffer latin1 encoding: ", latin1Buffer);
console.log("String in latin1:", rows);

console.log("");
rows = latin1String;
let data2 = buffer.transcode(Buffer.from(rows, "latin1"), "latin1", "utf8" );
console.log("Buffer in UTF8:", data2);
console.log("String in UTF8: ", data2.toString());

output:

Buffer latin1 encoding:  <Buffer e5 6e 67 73 74 72 f6 6d>
String in latin1: ångström

Buffer in UTF8: <Buffer c3 a5 6e 67 73 74 72 c3 b6 6d>
String in UTF8:  ångström
Luuk
  • 12,245
  • 5
  • 22
  • 33