I am scraping some JSONP dictionaries from AWS (from javascript files). After parsing the raw data for only the JSON-like data, in some cases I get a valid JSON and can successfully load this in Python (json_data = json.loads(json_like_data)
). However, some of Amazon's JSONPs do not include quotes around their keys (see the following).
...
{type:"storageCurrentGen",sizes:
[{size:"i2.xlarge",vCPU:"4",ECU:"14",memoryGiB:"30.5",storageGB:"1 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"0.938"}}]},
{size:"i2.2xlarge",vCPU:"8",ECU:"27",memoryGiB:"61",storageGB:"2 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"1.876"}}]},
{size:"i2.4xlarge",vCPU:"16",ECU:"53",memoryGiB:"122",storageGB:"4 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"3.751"}}]},
...
For JSONP, this still works as it is valid JavaScript syntax. However, Python's json.loads(json_str)
craps out as it is not valid JSON.
There is another Python module YAML which can handle unquoted keys, BUT there must be a space after the semicolons (:
).
I figure that I have two options.
- Somehow replace character in between an open brace or comma (
{
|,
) and a colon (:
). Then usejson.loads(...)
. - Add a space after ever colon (
:
). Then parse withyaml.load(...)
.
My guess is that option 2 is better than 1. However, I am seeking suggestion of a better solution.
Has anyone encountered an ill-formatted JSON such as this before and used Python to parse it?