0

I received a json from work and I need to put it on a pandas dataframe to analyze it. But Json came all weird missing quotes in some keys, well, I managed to solve this with but when I try to save the clean file (with the quotes) I have the error of in place of spaces I get "\"

The first json is like:

{"text":"CRD","qtip":"CRD","checked":false,"id":"5746p+5746!!DB::5746p","tsmax":1597788000000,"cost":{"202008":{"1":169.08978075830004,2:167.74736783330005,3:174.24878584530003,4:174.8258770385,5:173.49233388220006,6:174.67223572639998,7:175.13397411730006,8:170.16884950140002,9:170.46993024780002,10:177.9452736487,11:177.5172502943,12:176.8791710277,13:174.97972124289998,14:177.50405061630002,15:173.38104341860006,16:173.06225567280003,17:177.96418680850005,18:166.50419259929998,19:26.4,20:26.4,21:26.4,22:26.4,23:26.4,24:26.4,25:26.4,26:26.4,27:26.4,28:26.4,29:26.4,30:26.4,31:26.4,"avg":114.74620938931994,"estimate":5362.688623746617,"total":3468.7862816795982},"202006":{2:200.35115201620002,3:200.10138898519995,4:199.82156664749996,5:200.15923924589998,6:196.75398413409997,7:195.03808163369996,8:201.02971647659996,10:199.67497236949998,1:198.11742151040002,9:197.93282962139997,11:188.64197859260003,12:180.88797667890003,13:157.42718581050002,14:158.81001735070006,15:162.00267321069998,16:162.50923629150003,17:161.5225927354,18:160.28129821180008,19:159.21280188340003,20:159.87749734020005,21:161.2513404355,22:163.3872927245,23:162.49602354750004,24:161.9270906232,25:162.9849800909,26:165.5652523876,27:159.06988490620003,28:162.89904494350003,29:165.6445198728,30:165.63371783310004,"avg":175.70042534269976,"total":5271.012760280993},"202007"...

I solved the problem with this code below:

import re

regex = r"(\w+)\s?:\s?(\"?[^\",]+\"?,?)"

subst = "\"\\1\":\\2"

result = re.sub(regex, subst, test_str,0) 

print (result)

[{"text":"CRD","qtip":"CRD","checked":false,"id":"5746p+5746!!"DB"::5746p","tsmax":1597788000000,"cost":{"202008":{"1":169.08978075830004,"2":167.74736783330005,"3":174.24878584530003,"4":174.8258770385,"5":173.49233388220006,"6":174.67223572639998,"7":175.13397411730006,"8":170.16884950140002,"9":170.46993024780002,"10":177.9452736487,"11":177.5172502943,"12":176.8791710277,"13":174.97972124289998,"14":177.50405061630002,"15":173.38104341860006,"16":173.06225567280003,"17":177.96418680850005,"18":166.50419259929998,"19":26.4,"20":26.4,"21":26.4,"22":26.4,"23":26.4,"24":26.4,"25":26.4,"26":26.4,"27":26.4,"28":26.4,"29":26.4,"30":26.4,"31":26.4,"avg":114.74620938931994,"estimate" ...

Now I want to save this 'str' as a json file, so i used this code:

with open('data.json', 'w', encoding='utf-8') as f:
    test = json.dumps(result,separators=(',', ':'))

"\n[{\"text\":\"CRD\",\"qtip\":\"CRD\",\"checked\":false,\"id\":\"5746p+5746!!\"DB\"::5746p\",\"tsmax\":1597788000000,\"cost\":{\"202008\":{\"1\":169.08978075830004,\"2\":167.74736783330005,\"3\":174.24878584530003,\"4\":174.8258770385,\"5\":173.49233388220006,\"6\":174.67223572639998,\"7\":175.13397411730006,\"8\":170.16884950140002,\"9\":170.46993024780002,\"10\":177.9452736487,\"11\":177.5172502943,\"12\":176.8791710277,\"13\":174.97972124289998,\"14\":177.50405061630002,\"15\":173.38104341860006,\"16\":173.06225567280003,\"17\":177.96418680850005,\"18\":166.50419259929998,\"19\":26.4,\"20\":26.4,\"21\":26.4,\"22\":26.4,\"23\":26.4,\"24\":26.4,\"25\":26.4,\"26\":26.4,\"27\":26.4,\"28\":26.4,\"29\":26.4,\"30\":26.4,\"31\":26.4,\"avg\":114.74620938931994,\"estimate\":5362.688623746617,\"total\":3468.7862816795982},\"202006\":{\"2\":200.35115201620002,\"3\":200.10138898519995,\"4\":199...

How do I fix it? and how do I save the file correctly?

AlceFilho
  • 11
  • 3
  • 1
    This is correct. There aren't any backslashes in `test` - you will see it if instead of inspecting the variable in IDLE (or whatever interactive environment you use) you print it: `print(test)`, or even directly write to your file: `f.write(test)`. IDLE just escapes quotes with backslashes for presentation, they are not added to the string. – Błotosmętek Aug 21 '20 at 21:19
  • You would have to load the json first with `json.loads(json_string)` (if you want it formatted) and then pass this value to `json.dumps()` like so: `json.dumps(loaded, separators=(',', ': '), indent=4)`. Then you can write to the file with `f.write(test)`. Also, your regex didn't fix everything in the json. It still is invalid. If you paste it [here](https://jsonformatter.curiousconcept.com/#) then you can see where there are problems. [Here's a demo](https://repl.it/@marsnebulasoup/MinorTrustworthyAdvance#main.py) with the json file saved correctly. – marsnebulasoup Aug 21 '20 at 21:39
  • If you don't want it formatted, then just write `result` directly to the file with `f.write(result)`, and skip the `json.dumps()` part. It's then written to the file like you want. – marsnebulasoup Aug 21 '20 at 21:42
  • @MarsNebulaSoup Thanks dude, you helped me a lot was just using `f.write (result)`, just like you said there is another problem in json that I will have to fix which is the `JSONDecodeError: Expecting ',' delimiter: line 2 column 64 (char 64) ` I will study the regex one more time in order to solve the comma problems – AlceFilho Aug 22 '20 at 12:53
  • As this is my first question here I think I was nervous, I even made a mistake in the question, basically I want to take the file.json file (without formatting) to format it with some function and then use `import json with open ('file_formatted.json') as f: d = json.load (f) [0] ` – AlceFilho Aug 22 '20 at 12:59

0 Answers0