3

I'm attempting to look up the word "flower" in Google's dictionary semi-api. Source:

https://gist.github.com/DelvarWorld/0a83a42abbc1297a6687

Long story short, I'm calling JSONP with a callback paramater then regexing it out.

But it hits this snag:

undefined:1
ple","terms":[{"type":"text","text":"I stopped to buy Bridget some \x3cem\x3ef
                                                                    ^
SyntaxError: Unexpected token x
    at Object.parse (native)

Google is serving me escaped HTML characters, which is fine, but JSON.parse cannot handle them?? What's weirding me out is this works just fine:

$ node

> JSON.parse( '{"a":"\x3cem"}' )
  { a: '<em' }

I don't get why my thingle is crashing

Edit These are all nice informational repsonses, but none of them help me get rid of the stacktrace.

Community
  • 1
  • 1
Andy Ray
  • 30,372
  • 14
  • 101
  • 138

4 Answers4

2

\xHH is not part of JSON, but is part of JavaScript. It is equivalent to \u00HH. Since the built-in JSON doesn't seem to support it and I doubt you'd want to go through the trouble of modifying a non-built-in JSON implementation, you might just want to run the code in a sandbox and collect the resulting object.

icktoofay
  • 126,289
  • 21
  • 250
  • 231
  • Another hack for if you need to parse a “nearly JSON” structure is to replace `\x` with `\u00` before parsing. This is slightly safer as it avoids eval'ing. – bobince Jul 31 '13 at 16:55
  • @bobince: Right; that's sort of why I included the “`\xHH` ≡ `\u00HH`” bit. The problem with that is that you have to be careful of other escapes, e.g., don't change `\\xHH` (which is the literal text `\xHH`) into `\\u00HH` (the literal text `\u00HH`). I, too, agree that `eval`ing is usually undesirable, but if you do it in a sandbox without access to…almost anything, with a timeout, it should be safe. – icktoofay Aug 01 '13 at 02:49
0

According to http://json.org, a string character in a JSON representation of string may be:

  • any-Unicode-character- except-"-or--or- control-character
  • \"
  • \
  • \/
  • \b
  • \f
  • \n
  • \r
  • \t
  • \u four-hex-digits

So according to that list, the "json" you are getting is malformed at \x3

Paul
  • 26,170
  • 12
  • 85
  • 119
0

The reason why it works is because these two are equivalent.

JSON.parse( '{"a":"\x3cem"}' )

and

JSON.parse( '{"a":"<em"}' )

you string is passed to JSON.parse already decoded since its a literal \x3cem is actually <em

Now, \xxx is valid in JavaScript but not in JSON, according to http://json.org/ the only characters you can have after a \ are "\/bfnrtu.

Musa
  • 96,336
  • 17
  • 118
  • 137
0

answer is correct, but needs couple of modifications. you might wanna try this one: https://gist.github.com/Selmanh/6973863

Selman Kahya
  • 175
  • 2
  • 2