140

I am retrieving Twitter data with a Python tool and dump these in JSON format to my disk. I noticed an unintended escaping of the entire data-string for a tweet being enclosed in double quotes. Furthermore, all double quotes of the actual JSON formatting are escaped with a backslash.

They look like this:

"{\"created_at\":\"Fri Aug 08 11:04:40 +0000 2014\",\"id\":497699913925292032,

How do I avoid that? It should be:

{"created_at":"Fri Aug 08 11:04:40 +0000 2014" .....

My file-out code looks like this:

with io.open('data'+self.timestamp+'.txt', 'a', encoding='utf-8') as f:
            f.write(unicode(json.dumps(data, ensure_ascii=False)))
            f.write(unicode('\n'))

The unintended escaping causes problems when reading in the JSON file in a later processing step.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
toobee
  • 2,592
  • 4
  • 26
  • 35

6 Answers6

207

You are double encoding your JSON strings. data is already a JSON string, and doesn't need to be encoded again:

>>> import json
>>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"}
>>> encoded_data = json.dumps(not_encoded)
>>> print encoded_data
{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}
>>> double_encode = json.dumps(encoded_data)
>>> print double_encode
"{\"created_at\": \"Fri Aug 08 11:04:40 +0000 2014\"}"

Just write these directly to your file:

with open('data{}.txt'.format(self.timestamp), 'a') as f:
    f.write(data + '\n')
simeg
  • 1,889
  • 2
  • 26
  • 34
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • f.write(data + '\n') -- correlates to -- data = encoded_data -- from your example. – Rich Elswick May 19 '20 at 21:10
  • @RichElswick the OP uses the variable `data`, which contains already-encoded JSON data, so yes, I used the variable name `encoded_data` to illustrate what was going on. – Martijn Pieters May 20 '20 at 09:20
  • 1
    For those, like me, who were getting `\\"` when double-escaping, this is because a bare variable like `double_encode` in the interpreter will escape the backslashes for you. If you instead use `print(double_encode)`, as Martijn used, the double-escaped string will be printed with single backslashes as shown. – Nick K9 Jul 10 '22 at 22:07
14

Another situation where this unwanted escaping can happen is if you try to use json.dump() on the pre-processed output of json.dumps(). For example

import json, sys
json.dump({"foo": json.dumps([{"bar": 1}, {"baz": 2}])},sys.stdout)

will result in

{"foo": "[{\"bar\": 1}, {\"baz\": 2}]"}

To avoid this, you need to pass dictionaries rather than the output of json.dumps(), e.g.

json.dump({"foo": [{"bar": 1}, {"baz": 2}]},sys.stdout)

which outputs the desired

{"foo": [{"bar": 1}, {"baz": 2}]}

(Why would you pre-process the inner list with json.dumps(), you ask? Well, I had another function that was creating that inner list out of other stuff, and I thought it would make sense to return a json object from that function... Wrong.)

Mike Maxwell
  • 547
  • 4
  • 11
1

Set escape_forward_slashes=False to prevent escaping / characters

Solved:

ujson.dumps({"a":"aa//a/dfdf"}, escape_forward_slashes=False )

'{"a":"aa//a/dfdf"}'

Default:

ujson.dumps({"a":"aa//a/dfdf"}, escape_forward_slashes=True )

'{"a":"aa\\/\\/a\\/dfdf"}'

joanis
  • 10,635
  • 14
  • 30
  • 40
AdiSa
  • 11
  • 2
0

Extending for others having similar issue, I used this to dump the JSON formatted data to file where the data came from an API call. Just an indicative example below, update as per your requirement

import json

# below is an example, this came for me from an API call
json_string = '{"address":{"city":"NY", "country":"USA"}}'

# dump the JSON data into file ( dont use json.dump as explained in other answers )
with open('direct_json.json','w') as direct_json:    
    direct_json.write(json_string)
    direct_json.write("\n")

# load as dict
json_dict = json.loads(json_string)

# pretty print
print(json.dumps(json_dict, indent = 1)) 

# write pretty JSON to file
with open('formatted.json','w') as formatted_file: 
    json.dump(json_dict, formatted_file, indent=4)  
TechFree
  • 2,600
  • 1
  • 17
  • 18
0

Simple way to get around that, which worked for me is to use the json loads function before dumping, like the following :

import json
data = json.loads('{"foo": json.dumps([{"bar": 1}, {"baz": 2}])}')
with open('output.json','w') as f:
   json.dump(data,f,indent=4)

0

I had a same issue but it was in the db section like when i add the data from python code to db using json.dump it create the similar like this: "{"created_at":"Fri Aug 08 11:04:40 +0000 2014","id":497699913925292032,

but inside of using json.dump(existing_dict)

try->

from sqlalchemy import JSON, cast

cast(existing_dict , JSON)

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 27 '23 at 16:13