0

I'm trying to make a Unicode Converter from UTF-8 to UTF-16, but I want to only convert strings and objects and stuff like that in JSON, not the structure of JSON because that would make it unreadable to programs.

Entered UTF-8:

{
    "hi":{
        "this":["is","just","some","example"]
    }
}

Expected UTF-16:

{
    "\u0068\u0069":{
        "\u0074\u0068\u0069\u0073":["\u0069\u0073","\u006a\u0075\u0073\u0074","\u0073\u006f\u006d\u0065","\u0065\u0078\u0061\u006d\u0070\u006c\u0065"]
    }
}

Can anyone help? Also, I'm pretty new to Javascript, so don't be too harsh.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Anonymous
  • 47
  • 5
  • Use a tokeniser to find all strings, which per [the JSON spec](https://json.org) can only be "things that start and end with double quotes". Then replace each found entry with the reencoded value. – Mike 'Pomax' Kamermans Oct 05 '21 at 14:58
  • @Mike'Pomax'Kamermans but how do I reencode? Thats my big question – Anonymous Oct 05 '21 at 15:02
  • If that's your question, your question has nothing to do with JSON, and you should update your post. After searching for "how to I convert JS string to UTF16" to see if there's an answer for that on the internet (which there is, [even here on SO](https://stackoverflow.com/questions/37596748/how-do-i-encode-a-javascript-string-in-utf-16)). – Mike 'Pomax' Kamermans Oct 05 '21 at 15:03

1 Answers1

1

This is a pretty good way to do what you need:

let json_string = '{ "hi": {"this": ["is", "just", "some", "example"]}}'
function encode_to_utf16(x) {
  var res = "";
  for (var i = 0; i < x.length; i++) res+= "\\u" + ("000" + x[i].charCodeAt(0).toString(16)).substr(-4);
  return res;
}

// get all the values present between the double quotes
let arr = json_string.match(/"[^"]+"/gi)
arr.forEach((str)=>{
  // replace double quotes with empty string
  str = str.replace(/"/gi,"");
  // encode and replace them
  json_string = json_string.replace(str,encode_to_utf16(str))
})
console.log(json_string)

However, if there is a double quote with escape character (\") somewhere in a string, then this'll break down. Maybe, you can improvise on that.

Keshav Bajaj
  • 863
  • 1
  • 5
  • 13