2

I am creating a json body for a REST payload body like so:

>>> j = json.loads('["foo", {"bar": ["to_be_replaced", 1.1, 1.0, 2]}]')
>>> text = "aaaa" + "\\" + "bbbbb" + "\\" + "cccc"
>>> j[1]["bar"][0] = text
>>> j
['foo', {'bar': ['aaaa\\bbbbb\\cccc', 1.1, 1.0, 2]}]

Annoyingly, the format expected on the other side is like so

"aaaa\bbbb\cccc". 

A terrible idea, I know.

I have tried everything and am starting to believe it's simply impossible to store text in this format in a json object. Is there a way? Or do I need to get the developers of the webservice to choose a more sensible delimiter.

I know it's REALLY a single backslash and if I do a print a get a single backslash

>>> print(text)
aaaa\bbbbb\cccc

But that doesn't help me get it into a json object.

COOLBEANS
  • 729
  • 3
  • 13
  • 31
  • 3
    Well, it _is_ a single backslash, that's just how python represents it. – cs95 Apr 10 '18 at 21:55
  • 2
    `"aaaa\bbbb\cccc"` **is not valid JSON**. If they're asking you to serialize to that string, they quite by definition aren't asking for JSON (and complaints by customers/integration partners/&c. like yourself are part of the market forces that should quite rightly push them to use a *real* JSON parser rather than hand-rolling something noncompliant). – Charles Duffy Apr 10 '18 at 21:58
  • I'm not sure this is a duplicate because no other answer in SO has helped me resolve the issue. I've explained that print shows a single backslash. That's fine. But now how do i get that single backslash into a json object? – COOLBEANS Apr 10 '18 at 22:01
  • 1
    The `\b` part is valid JSON, which means a backspace rather than a backslash and a `b`, but the `\c` part is not valid JSON at all, and any compliant parser should call that an error. So obviously we're dealing with a hacky, home-built parser. The first question is whether we can work around that parser with something that's nevertheless valid JSON. What happens if you send, say, `\u005Cc`? Does that get parsed as `\c` on the other end? If so, that's probably the best solution. – abarnert Apr 10 '18 at 22:10
  • 1
    If that doesn't work, you can't just use a JSON generator—but you can use a JSON generator followed by a "hackitupforcrappyparser` function that unescapes correct JSON sequences into incorrect nonsense in the format the broken parser expects. You probably do need to do a bit more research to verify whether they're just ignoring all backslashes, or interpreting some of them but ignoring others, before you can write code to implement the not-JSON format they expect, but once you know that, it should be pretty simple. – abarnert Apr 10 '18 at 22:13
  • @abarnert, ...btw, I'm quite sure we *have* had this actual question before (asked by someone trying to stuff a Windows UCS path in a JSON file without having the number of literal backslashes modified), but Googling it up is being quite a pain; almost wondering if the prior instance was deleted. – Charles Duffy Apr 10 '18 at 22:19
  • 1
    @CharlesDuffy Would approve your answer if given. Not being json is a strong argument for me to go back to them with. – COOLBEANS Apr 10 '18 at 22:21
  • 1
    @COOLBEANS That's an even better solution than hacking up your JSON, of course. Hope you can pull it off. (And even if you can't, see if you can at least get them to document their JSON-like-but-not-JSON format…) – abarnert Apr 10 '18 at 22:23

1 Answers1

5

Yes, it is impossible -- by design.

A JSON parser is, by nature, supposed to emit only valid JSON. From RFC 8259, emphasis mine:

7. Strings

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point. The hexadecimal letters A through F can be uppercase or lowercase. So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".

Alternatively, there are two-character sequence escape representations of some popular characters. So, for example, a string containing only a single reverse solidus character may be represented more compactly as "\\".


Note the phrase "MUST be escaped" -- "MUST" is a formally-defined term-of-art; something which does not comply with a MUST requirement from the JSON specification is not allowed to call itself JSON.

In summary: A string containing only a literal backslash in your data may be encoded in JSON as "\u005c", or "\\". It may not be encoded as "\" (including that character as an unescaped literal).

Community
  • 1
  • 1
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441