4

I have fetched json data from url and write it to in a file name urljson.json i want to format the json data removing '\' and result [] key for requirment purpose In my json file the data are arranged like this

{\"result\":[{\"BldgID\":\"1006AVE \",\"BldgName\":\"100-6th Avenue SW (Oddfellows)          \",\"BldgCity\":\"Calgary             \",\"BldgState\":\"AB \",\"BldgZip\":\"T2G 2C4  \",\"BldgAddress1\":\"100-6th Avenue Southwest                \",\"BldgAddress2\":\"ZZZ None\",\"BldgPhone\":\"4035439600     \",\"BldgLandlord\":\"1006AV\",\"BldgLandlordName\":\"100-6 TH Avenue SW Inc.                                     \",\"BldgManager\":\"AVANDE\",\"BldgManagerName\":\"Alyssa Van de Vorst           \",\"BldgManagerType\":\"Internal\",\"BldgGLA\":\"34242\",\"BldgEntityID\":\"1006AVE \",\"BldgInactive\":\"N\",\"BldgPropType\":\"ZZZ None\",\"BldgPropTypeDesc\":\"ZZZ None\",\"BldgPropSubType\":\"ZZZ None\",\"BldgPropSubTypeDesc\":\"ZZZ None\",\"BldgRetailFlag\":\"N\",\"BldgEntityType\":\"REIT                     \",\"BldgCityName\":\"Calgary             \",\"BldgDistrictName\":\"Downtown            \",\"BldgRegionName\":\"Western Canada                                    \",\"BldgAccountantID\":\"KKAUN     \",\"BldgAccountantName\":\"Kendra Kaun                   \",\"BldgAccountantMgrID\":\"LVALIANT  \",\"BldgAccountantMgrName\":\"Lorretta Valiant                        \",\"BldgFASBStartDate\":\"2012-10-24\",\"BldgFASBStartDateStr\":\"2012-10-24\"}]}

I want it like this format

[  
   {  
      "BldgID":"1006AVE",
      "BldgName":"100-6th Avenue SW (Oddfellows)          ",
      "BldgCity":"Calgary             ",
      "BldgState":"AB ",
      "BldgZip":"T2G 2C4  ",
      "BldgAddress1":"100-6th Avenue Southwest                ",
      "BldgAddress2":"ZZZ None",
      "BldgPhone":"4035439600     ",
      "BldgLandlord":"1006AV",
      "BldgLandlordName":"100-6 TH Avenue SW Inc.                                    ",
      "BldgManager":"AVANDE",
      "BldgManagerName":"Alyssa Van de Vorst           ",
      "BldgManagerType":"Internal",
      "BldgGLA":"34242",
      "BldgEntityID":"1006AVE ",
      "BldgInactive":"N",
      "BldgPropType":"ZZZ None",
      "BldgPropTypeDesc":"ZZZ None",
      "BldgPropSubType":"ZZZ None",
      "BldgPropSubTypeDesc":"ZZZ None",
      "BldgRetailFlag":"N",
      "BldgEntityType":"REIT                     ",
      "BldgCityName":"Calgary             ",
      "BldgDistrictName":"Downtown            ",
      "BldgRegionName":"Western Canada                                    ",
      "BldgAccountantID":"KKAUN     ",
      "BldgAccountantName":"Kendra Kaun                   ",
      "BldgAccountantMgrID":"LVALIANT  ",
      "BldgAccountantMgrName\":"      Lorretta Valiant                        ",
      "BldgFASBStartDate":"2012-10-24",
      "BldgFASBStartDateStr":"2012-10-24"
   }   `
]

i have tried replace("\","") but nothing changed Here is my code

import json


import urllib2
urllink=urllib2.urlopen("url").read()

print urllink -commented out



with open('urljson.json','w')as outfile:
    json.dump(urllink,outfile)


jsonfile='urljson.json'
jsondata=open(jsonfile)

data=json.load(jsondata)
data.replace('\'," ") --commented out
print (data)

but it is saying fileobject has no replace attribute, I didnt find any idea how to remove 'result' and most outer "{}" kindly guide me i think the file object is not parsed in string somehow .i am beginner in python thank you

ppasler
  • 3,579
  • 5
  • 31
  • 51
Kalyan
  • 1,880
  • 11
  • 35
  • 62
  • A suggestion, You may use `json.loads()` and `json.dumps()` instead of `json.load()` and `json.dump()` they directly take file path as param so you dont read to `open` file and `read` data from it – ZdaR Jan 25 '17 at 07:04
  • @ZdaR - no, they take a json string, not the path to a string. – tdelaney Jan 25 '17 at 07:08
  • 1
    I'm confused about that first string you showed us. Its not json but it looks like the python representation of a json string. So, where did it come from? – tdelaney Jan 25 '17 at 07:10
  • You save to a file... did you want to keep a copy or was that just an intermediate step for the decode? – tdelaney Jan 25 '17 at 07:16

3 Answers3

3

JSON is a serialized encoding for data. urllink=urllib2.urlopen("url").read() read that serialized string. With json.dump(urllink,outfile) you serialized that single serialized JSON string again. You double-encoded it and that's why you see those extra "\" escape characters. json needs to escape those characters so as not to confuse them with the quotes it uses to demark strings.

If you wanted the file to hold the original json, you wouldn't need to encode it again, just do

with open('urljson.json','w')as outfile:
    outfile.write(urllink)

But it looks like you want to grab the "result" list and only save that. So, decode the JSON into python, grab the bits you want, and encode it again.

import json
import codecs
import urllib2

# read a json string from url
urllink=urllib2.urlopen("url").read()

# decode and grab result list
result = json.loads(urllink)['result']

# write the json to a file
with open('urljson.json','w')as outfile:
    json.dump(result, outfile)
tdelaney
  • 73,364
  • 6
  • 83
  • 116
1

Tidy up the JSON object before writing it to file. It has lot of whitespace noise. Try like this:

urllink = {a.strip():b.strip() for a,b in json.loads(urllink).values()[0][0].items()}
jsonobj = json.loads(json.dumps(urllink))

with open('urljson.json','w') as outfile:
    json.dump(jsonobj, outfile)

For all objects:

jsonlist = []

for dirtyobj in json.loads(urllink)['result']:
     jsonlist.append(json.loads(json.dumps({a.strip():b.strip() for a,b in dirtyobj.items()})))

with open('urljson.json','w') as outfile:
    json.dump(json.loads(json.dumps(jsonlist)), outfile)

Don't wanna tidy up? Then simply do this:

jsonobj = json.loads(urllink)

And you can't do '\', it's syntax error. The second ' is escaped and is not considered as closing quote.

data.replace('\'," ")

Why can't Python's raw string literals end with a single backslash?

Community
  • 1
  • 1
Mohammad Yusuf
  • 16,554
  • 10
  • 50
  • 78
  • Thank you for that tidy code but it is showing only first index json elements how could i do it for full json set. the length of the array is 23510 and for bigger number it is showing index out of bound and if i remove [][]items and keeping value() then it is showing too many value to unpack – Kalyan Jan 25 '17 at 07:41
  • Can you give the url from where you are fetching? It would be easier to understand then. – Mohammad Yusuf Jan 25 '17 at 07:50
  • @Kalyan Ok try for last time. If it doesn't works, I'll delete this. – Mohammad Yusuf Jan 25 '17 at 08:32
1

\ is escape character in json:

enter image description here

you can load json string to python dict: enter image description here

宏杰李
  • 11,820
  • 2
  • 28
  • 35