Why does JSON.parse choke on encoded characters in nodejs?

Question

I'm attempting to look up the word "flower" in Google's dictionary semi-api. Source:

https://gist.github.com/DelvarWorld/0a83a42abbc1297a6687

Long story short, I'm calling JSONP with a callback paramater then regexing it out.

But it hits this snag:

undefined:1
ple","terms":[{"type":"text","text":"I stopped to buy Bridget some \x3cem\x3ef
                                                                    ^
SyntaxError: Unexpected token x
    at Object.parse (native)

Google is serving me escaped HTML characters, which is fine, but JSON.parse cannot handle them?? What's weirding me out is this works just fine:

$ node

> JSON.parse( '{"a":"\x3cem"}' )
  { a: '<em' }

I don't get why my thingle is crashing

Edit These are all nice informational repsonses, but none of them help me get rid of the stacktrace.

Take a look at string in http://json.org/ – Paul Jul 31 '13 at 03:59 — Paul, Jul 31 '13 at 03:59

icktoofay · Accepted Answer · 2018-01-26T07:38:02.207

2

\xHH is not part of JSON, but is part of JavaScript. It is equivalent to \u00HH. Since the built-in JSON doesn't seem to support it and I doubt you'd want to go through the trouble of modifying a non-built-in JSON implementation, you might just want to run the code in a sandbox and collect the resulting object.

edited Jan 26 '18 at 07:38

answered Jul 31 '13 at 04:05

icktoofay

126,289
21
250
231

Another hack for if you need to parse a “nearly JSON” structure is to replace `\x` with `\u00` before parsing. This is slightly safer as it avoids eval'ing. – bobince Jul 31 '13 at 16:55
@bobince: Right; that's sort of why I included the “`\xHH` ≡ `\u00HH`” bit. The problem with that is that you have to be careful of other escapes, e.g., don't change `\\xHH` (which is the literal text `\xHH`) into `\\u00HH` (the literal text `\u00HH`). I, too, agree that `eval`ing is usually undesirable, but if you do it in a sandbox without access to…almost anything, with a timeout, it should be safe. – icktoofay Aug 01 '13 at 02:49

score 0 · Answer 2 · answered Jul 31 '13 at 04:03

According to http://json.org, a string character in a JSON representation of string may be:

any-Unicode-character- except-"-or--or- control-character
\"
\
\/
\b
\f
\n
\r
\t
\u four-hex-digits

So according to that list, the "json" you are getting is malformed at \x3

score 0 · Answer 3 · answered Jul 31 '13 at 04:04

The reason why it works is because these two are equivalent.

JSON.parse( '{"a":"\x3cem"}' )

and

JSON.parse( '{"a":"<em"}' )

you string is passed to JSON.parse already decoded since its a literal \x3cem is actually <em

Now, \xxx is valid in JavaScript but not in JSON, according to http://json.org/ the only characters you can have after a \ are "\/bfnrtu.

score 0 · Answer 4 · answered Oct 14 '13 at 10:50

0

answer is correct, but needs couple of modifications. you might wanna try this one: https://gist.github.com/Selmanh/6973863

answered Oct 14 '13 at 10:50

Selman Kahya

175
2
2

Why does JSON.parse choke on encoded characters in nodejs?

4 Answers4