JSON.parse() keep string encoded

Question

//string with correct json format

{"reaction":"\ud83d\udc4d","user":{"id":"xyz"}}

//after JSON.parse()

{ reaction: '', user: [Object] }

What I want to do is keep the reaction value encoded, but JSON.parse() does not exactly do what I want.

Update

In the end I decided to leave JSON.parse() alone and fix the database issue as @Brad suggested. I changed the database format, but that was not enough to fix the problem, so I found this. Every statement must now start with SET NAMES utf8mb4; then the query. Also in the connection you then have to have these {charset : 'utf8mb4', multipleStatements: true}. Without node-mysql proper documentation it's quite hard to find the best answer, but in the end I got to learn a lot along the way, Thank you.

@PatrickRoberts I'm running under nodejs, it's working alright, but it decodes the string value to the thumbs up as you can see, I want to keep it encoded. — Got To Figure, Feb 23 '18 at 23:07
`JSON.parse` won’t be encoding anything, looks to me like it’s the console or whatever way your logging the data out. What’s the intended use of the parsed data? — James, Feb 23 '18 at 23:09
@James Possibly, I console log it, but then I store it in the database which shows up as bunch of question marks so I would like to keep it encoded no matter where it's stored or outputted. — Got To Figure, Feb 23 '18 at 23:13
So when you parse the data, presumably you manipulate it in some way then `JSON.stringify` again for storing in the DB? I can’t see how this is happening unless you are expecting `toString` on the object to deserialize (which it won’t) — James, Feb 23 '18 at 23:19
@James I use mysql and just pass json object values into fields of mysql query. Sounds a little too easy, but encoded utf8 is still a string in the database which later on I can decode. — Got To Figure, Feb 23 '18 at 23:22
@Adminy What are you viewing your database with and why don't you want it to show up as a bunch of question marks? It seems best to store the actual characters as-is, even if the tool you're viewing your DB with doesn't know how to display them. — Paul, Feb 23 '18 at 23:24
@Paulpro why don't I try pulling data from the database see if the question marks actually show something in a static html page, and I'll get back to you in a minute. — Got To Figure, Feb 23 '18 at 23:26
@Paulpro here is the database outputting it to HTML https://i.imgur.com/4TQ23j7.png — Got To Figure, Feb 23 '18 at 23:38
@Adminy Fix your character encoding. You're addressing this problem in completely the wrong way. Leave JSON alone. — Brad, Feb 24 '18 at 00:44

Raith · Accepted Answer · 2018-02-24T00:41:30.197

2

If you don't want parse to unencode that string then you could escape the backslashes, e.g. "\\ud83d\\udc4d"

Do you control where that data comes from? Perhaps you want to provide a "replacer" in JSON.stringify to escape those, or an "reviver" in JSON.parse.

What options do you have for exercising control over the stringify or parse?

apply a reviver

const myReviver = (key, val) => key === "reaction" ? val.replace(/\\/g, "\\\\") : val;

var safeObj = JSON.parse(myJson, myReviver);

CAUTION: This doesn't seem to work in a browser, as it appears the \uxxxx character is decoded in the string before the reviver is able to operate on it, and therefore there are no backslashes left to escape!

Multiple escaping

Following on from chat with the OP it transpired that adding multiple escaped backslashes to the property with utf characters did eventually lead to the desired value being stored in the database. A number of steps were unescaping the backslashes until the real utf character was eventually being exposed.

This is brittle and far from advisable, but it did help to identify what was/wasn't to blame.

NO backslashes

This appears to be the best solution. Strip all backslashes from the data before it is converted into the utf characters or processed in any way. Essentially storing deactivated "uxxxxuxxxx" codes in the database.

Those codes can be revived to utf characters at the point of rendering by reinserting the backslashes using a regular expression:

database_field.replace(/(u[0-9a-fA-F]{4})/g, "\\$1");

Ironically, that seems to skip utf interpretation and you actually end up with the string that was wanted in the first place. So to force it to deliver the character that was previously seen, it can be processed with:

emoji = JSON.parse(`{"utf": "${myUtfString}"}`).utf;

edited Feb 24 '18 at 00:41

answered Feb 23 '18 at 23:10

Raith

528
3
8

To answer your question, I have control over the string that I parse, but I have to manipulate the string, I can't change the format is coming in. – Got To Figure Feb 23 '18 at 23:12
So as to the solution you are proposing is I escape the backslashes with more backslashes? I can try that. – Got To Figure Feb 23 '18 at 23:15
Yes, my basic answer is to escape your backslashes. I'd personally prefer to do that at stringify, but sounds like you only have option during parse, so have provided example of reviver. – Raith Feb 23 '18 at 23:17
I was about to say I did `data.replace(/\\/g, "\\\\")` and it worked but thanks for your solution! – Got To Figure Feb 23 '18 at 23:20
The reviver obviously only applies the escaping to a parsed item with key === "reaction". It could find some false positives (I don't think it's possible to take in a wider scope to narrow it down further) but it's safer than applying a replace to the entire JSON string which might escape some characters that you *do* want unescaped during parse. – Raith Feb 23 '18 at 23:24
1

Sadly, starting to think that the reviver isn't the solution after all. It seems that the character gets converted within the string *before* the reviver is able to manipulate it. So applying to the string may be the only way. In which case it is worth using a more robust regex to only escape that property. – Raith Feb 23 '18 at 23:30
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/165742/discussion-between-adminy-and-raith). – Got To Figure Feb 23 '18 at 23:42

JSON.parse() keep string encoded

1 Answers1

apply a reviver

Multiple escaping

NO backslashes