-1

I am setting a python API to feed android app with java. the output API is JSON. I try to produce a pure Unicode string. I want to literally produce Unicode string without any double backslash.

This is what I want to produce : \u003chead\u003e

I tried decode but result in double backslash.

b'\u003chead\u003e'.decode('utf-8')
result 
\\\\u003chead\\\\u003e

While I​ want exactly.

\u003chead\u003e

Printed or exported in JSON. I am using python 3.6

Full code

import json
abc = {"me": b"\u003chead\u003e".decode('utf-8')}
json.dumps(abc)

result

{"me": "\\u003chead\\u003e"}

I want

{"me": "\u003chead\u003e"}
Khalid Ali
  • 1,224
  • 1
  • 8
  • 12
mambo
  • 25
  • 1
  • 6
  • What code are you using to "produce" your result? – Scott Hunter Apr 19 '19 at 13:56
  • 1
    What is the _exact_ code you use to print these strings? Copy and paste it to your question, it's your only chance to get helpful answers here. – Roland Illig Apr 19 '19 at 13:57
  • 1
    A _byte_ string that contains _unicode_ characters makes no sense. – RemcoGerlich Apr 19 '19 at 14:03
  • `x = u'\u003chead\u003e'; print(x)` prints ``. Is that what you want? – Devesh Kumar Singh Apr 19 '19 at 14:05
  • 1
    The problem is that it makes no sense to escape < and > needlessly. If you have a string that contains your data as Unicode, the information that some arbitrary characters in it were originally escaped is lost. If you keep the string as literally a backslash and a hex code, then the backslash itself will be escaped. _What are you trying to achieve?_ – RemcoGerlich Apr 19 '19 at 14:07
  • i want to achieve this {"me": "\u003chead\u003e"} as json.dumps. the reason is I have a third party java class that only accept this Unicode chars.so it is basically python to java @RemcoGerlich – mambo Apr 19 '19 at 14:22
  • sorry I edited the post to represent my intention. @ScottHunter – mambo Apr 19 '19 at 14:25
  • @mambo: then you shouldn't call it JSON and forget about using JSON -- you have a Java program that accepts some specific subset of JSON. – RemcoGerlich Apr 19 '19 at 14:25
  • @RemcoGerlich that is the problem, i have a geojson machine that pre-processes large data. The geojson data is easily processed in python and feed it to android via API with json format. – mambo Apr 19 '19 at 14:46
  • {"me": ""} is perfectly valid JSON format, you don't need that \u escaping for JSON. – RemcoGerlich Apr 19 '19 at 14:50

2 Answers2

0

Looking at this thread shows that dumping the data to JSON format should technically result in the unescaped unicode format so I'm not sure what it is you're doing that's resulting in a different than expected result?

Dan6erbond
  • 9
  • 1
  • 2
  • The problem is that his escaped codes decode to < and >, which don't need to be escaped so by default aren't. How can the library know which characters were \u encoded at some point? – RemcoGerlich Apr 19 '19 at 14:11
0

As far as I can see, you don't need to do much.

Your start string,

b"\u003chead\u003e'"

is already what you want to have. Except, it's not part of a larger JSON string. And no JSON library will produce what you need: the actual Unicode character \u003c is '<', and that will just be written '<' in JSON, and if you try to turn the characters '\', 'u', '0', '0', '3', 'c' into JSON then of course the backslash will need to be escaped. So you can't use JSON libraries.

The only solution I see is to use some placeholder in the data, JSON dump that, and then replace the string with what you want:

s = b"\u003chead\u003e'"
js = json.dumps({"me": "PLACEHOLDER"}).encode('utf8')

yourtext = js.replace(b"PLACEHOLDER", s)

Now yourtext contains what you want. Of course this fails if PLACEHOLDER already occurred somewhere else, so pick that string with caution.

And all of it is completely unnecessary as these characters don't need to be \u escaped at all.

RemcoGerlich
  • 30,470
  • 6
  • 61
  • 79