1

I got a string witch contains unicode sequence like "\u00c3\u00a7" witch should be displayed like "ç" but i got this "ç"

The data come from an export from Facebook in Json.

There is a related post for this subject (Facebook JSON badly encoded) and i tried to encode/decode with iconv, but without success!

Thank you !

Encode/Decode from/to latin1 to utf8:

iconv.decode(iconv.encode(str, 'latin1'), 'utf8');

Replace \u...:

 str.replace(/\\u([\d\w]{4})/gi, function (match, grp) {
     return String.fromCharCode(parseInt(grp, 16));
 });

I also tried with encodeURIComponent:

 encodeURIComponent(stringWithUnicode);
Off Axis
  • 11
  • 3
  • See [this answer](https://stackoverflow.com/a/5396742/2711488); it’s `fixed = decodeURIComponent(escape(str));` – Holger May 14 '19 at 15:56
  • It doesn't work... the data is contained in a JSON like i said, and the unicode sequence may have one or more elements like: ``` \u00c3\u00a0 ``` or ``` \u00e2\u0080\u0099 ```` When i convert the string i got the character per unicode sequence but not the char for the combined sequence – Off Axis May 15 '19 at 08:55
  • Don’t know what you mean. `"ç" == decodeURIComponent(escape("\u00c3\u00a7"))` gives me `true`. Same for `"’" == decodeURIComponent(escape("\u00e2\u0080\u0099"))` – Holger May 15 '19 at 10:30
  • You're right! But when i apply the escape function to a string containing the unicode sequence like ```"Ok voyons \u00c3\u00a7a"``` i get this ```"Ok voyons ça"``` Why ? – Off Axis May 16 '19 at 08:24
  • I tried this: ``` return str.replace(/(\\u[\d\w]{4})+/gi, function (match, grp) { // var bytes = match.split("\\u").filter(v => v.length).map(v => parseInt(v, 16)); // return String.fromCharCode(bytes[0] | bytes[1]); return decodeURIComponent(escape(match)); }); ``` match contains the combined sequence of unicode: "\u00c3" or "\u00c3\u00a7" or "\u00e2\u0080\u0099", .... the decodeURIComponent(escape(match)) returns the same as match value, not decoded: "\u00c3" or "\u00c3\u00a7" or "\u00e2\u0080\u0099", .... – Off Axis May 16 '19 at 08:57

1 Answers1

-1

I know nothing about this honestly, but I did notice that a bitwise OR (|) of the two bytes produces the correct character, if you send that unicode string into the below function you'll get the correct result:

function getExtended(uc){
    var bytes = uc.split("\\u").filter(v => v.length).map(v => parseInt(v, 16));
    return String.fromCharCode(bytes[0] | bytes[1]);
}
Trey
  • 5,480
  • 4
  • 23
  • 30