2

I have been fighting with this problem for a few weeks now and still cannot understand what's wrong here. I created a dictionary in python dic. Then I am using dumps to convert it to a valid json.

json_js = json.dumps(dic) # works fine, a valid json from the python's viewpoint
# the reverse operation works fine also
dic = json.loads(json_js)
print(json_js)
==============
{"p0": {"pf": {"id": "pf1", "class": ["pf", "w0", "h0"], "data-page-no": "1"}, "pc": {"class": ["pc", "pc1", "w0", "h0"]}, "img": ["<img alt=\"\" clas

This json_js I use later on to add it to a js script that applies JSON.parse() on it. And this is the error that I get.

SyntaxError: JSON.parse: expected ',' or ']' after array element at line 1 column 143 of the JSON data

That 143 character is the first \". But why can't js figure it as a valid JSON I cannot comprehend. Would you have any suggestions what might have gone wrong here?

EDIT. No idea why people close my question. The desired behaviour is that JSON.parse doesn't throw any errors. The way how I added it to a script is irrelevant for the question. Please, have a look at the part of the source code inside html .

const str = `{"p0": {"pf": {"id": "pf1f", "class": ["pf", "w2", "h2"], "data-page-no": "1f"}, "pc": {"class": ["pc", "pc1f", "w2", "h2"]}, "img": ["<img alt=\"\" class=\"bi x0 y0 w1 h1\"` // this string is several megabytes so I only put the first 150 or so characters here.
var dic = JSON.parse(str);

EDIT 2. The full transformation.

# in python using BeautifulSoup for scripts and new_html
new_html = bs()
dom = bs()
scripts = [dom.new_tag('script')]
scripts[0].string = html_script(json_js) # html_script is a "string... %s" %json_js
new_html.body.append(scripts[0])
with open('stuff.html','w',encoding='utf-8') as f: 
        f.write(str(new_html))
AlexZheda
  • 433
  • 7
  • 14
  • “add it to a js script” – how did you do that? – Ry- Jan 21 '20 at 01:28
  • I have added the source code from Firefox. This is how the browser sees it. And yet, it complains about `\"` – AlexZheda Jan 21 '20 at 01:39
  • Because you put it (seemingly) unescaped in a template string literal, which you shouldn’t have (it also supports escapes, i.e. ``(`\"`).length === 1`` – but don’t be fooled into thinking that `String.raw` is the correct alternative). What produced that HTML? Are you using a template engine? – Ry- Jan 21 '20 at 01:43
  • I have added it into EDIT 2. I did not use any template engine. I did it "by hand". – AlexZheda Jan 21 '20 at 01:50
  • This is what your string looks like: {"p0": {"pf": {"id": "pf1f", "class": ["pf", "w2", "h2"], "data-page-no": "1f"}, "pc": {"class": ["pc", "pc1f", "w2", "h2"]}, "img": [" – Ahmed Elyamani Jan 21 '20 at 02:24
  • @AhmedElyamani where did you see in my code "" and not \"\" ? – AlexZheda Jan 21 '20 at 02:40
  • @Ry-, could you please unblock my question, which would let someone write a suggested answer to my question? – AlexZheda Jan 21 '20 at 02:42
  • @AlexZheda I didn't see that in your code. That's what your string looks like if you just console.log() it. If you escape the backslash, it will work fine: https://jsfiddle.net/ykow203b/ – Ahmed Elyamani Jan 21 '20 at 02:47
  • @AhmedElyamani, refer to the first answer here. https://stackoverflow.com/questions/15637429/how-to-escape-double-quotes-in-json Why does it work here with a single backslash? – AlexZheda Jan 21 '20 at 02:53
  • @AhmedElyamani, I did check it with console.log. Yet, I would like to understand why it does not work with single backslashes. That was my original question. What goes wrong here even though, seemingly, my string applies the correct single backslashes. Also, this is in a part a python question since I am outputting the html via a python json transformation. Inside the python string I clearly see double backslashes but in the js script double backslashes "merge" into single backslashes. – AlexZheda Jan 21 '20 at 02:57
  • Answers from the same question explain this: https://stackoverflow.com/a/38923005/3112894 You need to give more thought to the way you input strings. If the final string that you'd like to be sent to the parser is \" then you have to either type in \\\" or `\\"`. That will then be stored as \", which is then parsed. – Ahmed Elyamani Jan 21 '20 at 03:19
  • The question and the problem as well as a solution are not trivial! Why did people downvote my question? Ry- gave a solution below. – AlexZheda Jan 21 '20 at 15:11

1 Answers1

2

In order to put a string of JSON into JavaScript source code, you used backticks to make it a template string. That’s not enough, though, since many sequences have special meaning inside backticks:

  • ${…} interpolation
  • backslash escape sequences (the current problem, which converted the \" in the JSON to ")
  • backticks

Embedding literals in JavaScript in HTML <script>s isn’t really trivial, but you can do it by JSON-encoding the JSON, which results in a string that’s almost ready to embed except for:

  • exiting the script element with </script>i
  • changing the parsing mode with <!--
  • creating a poorly-formed JavaScript string literal (for now) with U+2028 or U+2029

which can be avoided by escaping all <s, U+2028, and U+2029.

In all:

def script_embeddable_json(value):
    return (
        json.dumps(json.dumps(value))
        .replace("<", "\\u003c")
        .replace("\u2028", "\\u2028")
        .replace("\u2029", "\\u2029"))


json_js = script_embeddable_json(dic)

Then the template should look like const str = %s.

Ry-
  • 218,210
  • 55
  • 464
  • 476
  • this is what I got in the resulting html ( – AlexZheda Jan 21 '20 at 15:39
  • 2
    @AlexZheda: The backticks should be removed. Just `const str = exact value of json_js`. – Ry- Jan 21 '20 at 21:10
  • amazing! Thank you so much! that truly works! I spent weeks trying to hack around that. – AlexZheda Jan 21 '20 at 23:39