2

I am trying to parse the JSON written @ http://a0.awsstatic.com/pricing/1/ec2/sles-od.min.js

Here is a quick snippet from above link:

{vers:0.01,config:{rate:"perhr",valueColumns:["vCPU","ECU","memoryGiB","storageGB","sles"],currencies:["USD"],regions:[{region:"us-east",instanceTypes:[{type:"generalCurrentGen",sizes:[{size:"t2.micro",vCPU:"1",ECU:"variable",
...
...
...
...

Please visit the aforementioned link to see the complete JSON.

As seen above, none of the keys of above JSON have Double Quotes around them.

This leads to malformed JSON string and my JSON parser is failing at it. I also tried putting this JSON in http://www.jsoneditoronline.org/ and it fails as well.

Now, this is the same link which is used by Amazon to display various prices of their EC2 instance. So I think I am missing something here. My Googling led me to believe that above thing is not JSON and is instead JSONP.. I don't understand what is that.

Could you help me understand how to parse this JSON. BTW, I am doing this work in using JSON Module.

Some background:

Amazon Web Services does not have an API to get Pricing info programmatically. Hence I am parsing these links which is what amazon is doing while displaying pricing information here. Besides, I am not from programming space and perl is all I know.

slayedbylucifer
  • 22,878
  • 16
  • 94
  • 123
  • "This file is intended for use only on aws.amazon.com." Are you sure there is no other API that does what you need? –  Jul 28 '14 at 15:10
  • nope, Amazon does not provide an API to get Price information. Hence, many people have done similar thing of parsing these links. However, most of them are in python/ruby. I am trying in perl. – slayedbylucifer Jul 28 '14 at 15:12
  • 1
    @rightfold, check this: http://stackoverflow.com/questions/7334035/get-ec2-pricing-programmatically/ AND http://stackoverflow.com/questions/3636578/are-there-any-apis-for-amazon-web-services-pricing – slayedbylucifer Jul 28 '14 at 15:14
  • It's not JSON, `sles-od.min.js` contains JavaScript. – ikegami Jul 28 '14 at 15:14
  • @ikegami, hmmm... probably that's why.. What are my options to get that thing parsed in perl ? – slayedbylucifer Jul 28 '14 at 15:15
  • Didn't you just link to a solution? – ikegami Jul 28 '14 at 15:19
  • @ikegami, the 2 other SO threads I mentioned have JSON links which are deprecated. Hence the price information mentioned there does not match with today's pricing. The link I mentioned in the post has the correct pricing, but it is not in JSON format ... :( – slayedbylucifer Jul 28 '14 at 15:21
  • So what problem have you had parsing it? – ikegami Jul 28 '14 at 15:26
  • @ikegami, Well, It does not parse at all. It says malformed JSON string. So then I took a chunk out of it and "double quoted" all the keys in it... and it worked. So now I know the problem that keys are not double quoted. I also tried this quickly @ http://www.jsoneditoronline.org/ and validated. But I can't do this double quoting manually or using regex, it is a massive file and I am dealing with at least 50 such URLs. Still, I tried some regex combinations to double quote keys, but did not get much success yet. – slayedbylucifer Jul 28 '14 at 15:29
  • 1
    Yeah, it's not JSON, so you can't use a JSON parser. You need a JS parser. But thankfully, the program uses a small predictable subset of JS, so you don't have to write a full JS parser either. // It makes no sense to try to add quotes. You'd need to parse it to do that. And if you can do that, then there's no need to parse it. – ikegami Jul 28 '14 at 15:35

2 Answers2

3

Like you said JSONP or "JSON with padding" can't be parsed by json parser because it is not json (it is a different format). But it is actually a json with the prefix (padding)

The padding is typically the name of a callback function that wraps json.

In this case, its default callback names 'callback' and we can do a bit hackiest way by using Regular Expression to capture json that is wrapped by 'callback()' like this

s/callback\((.*)\);$/$1/s;

Also, if you would like to use JSON library, you can enable allow_barekey which means you don't need those quotes around those keys.

Below is my working code. I use LWP::Simple to get the content for the given and Data::Dump to print the isolated data structure.

use strict;
use warnings;

use LWP::Simple;
use JSON;

my $jsonp = get("http://a0.awsstatic.com/pricing/1/ec2/sles-od.min.js")
    or die "Couldn't get url";

( my $json = $jsonp ) =~ s/callback\((.*)\);$/$1/s; #grap the json from $jsonp and store in $json variable
my $hash = JSON->new->allow_barekey->decode ( $json );

use Data::Dump;
dd $hash;

Outputs:

{
  config => {
              currencies => ["USD"],
              rate => "perhr",
              regions => [
                {
                  instanceTypes => [
                    {
                      sizes => [
                                 {
                                   ECU => "variable",
                                   memoryGiB => 1,
                                   size => "t2.micro",
                                   storageGB => "ebsonly",
                                   valueColumns => [{ name => "os", prices => { USD => 0.023 } }],
                                   vCPU => 1,
                                 },
                                 {
                                   ECU => "variable",
                                   memoryGiB => 2,
                                   size => "t2.small",
                                   storageGB => "ebsonly",
                                   valueColumns => [{ name => "os", prices => { USD => 0.056 } }],
                                   vCPU => 1,
                                 },
                                 {
                                   ECU => "variable",
                                   memoryGiB => 4,
                                   size => "t2.medium",
                                   storageGB => "ebsonly",
                                   valueColumns => [{ name => "os", prices => { USD => 0.152 } }],
                                   vCPU => 2,
                                 },
                                 {
                                   ECU => 3,
                                   memoryGiB => 3.75,
                                   size => "m3.medium",
                                   storageGB => "1 x 4 SSD",
                                   valueColumns => [{ name => "os", prices => { USD => "0.170" } }],
                                   vCPU => 1,
                                 },
....
slayedbylucifer
  • 22,878
  • 16
  • 94
  • 123
zdk
  • 1,528
  • 11
  • 17
2

As said in comments above, it is not JSON so it can't be parsed by JSON parser... But for an quick & (very)dirty work, you can try the JSON::DWIW module.

The next code:

use 5.014;
use warnings;
use WWW::Mechanize;
use Data::Dump;

use JSON::DWIW;

my $mech = WWW::Mechanize->new();
my $jsonstr = $mech->get('http://a0.awsstatic.com/pricing/1/ec2/sles-od.min.js')->content;
($jsonstr) = $jsonstr =~ /callback\((.*)\)/s;

my $json_obj = JSON::DWIW->new;
my $data = $json_obj->from_json( $jsonstr );
dd $data;

prints a structure what maybe is what you want, e.g.:

{
  config => {
              currencies => ["USD"],
              rate => "perhr",
              regions => [
                {
                  instanceTypes => [
                    {
                      sizes => [
                                 {
                                   ECU => "variable",
                                   memoryGiB => 1,
                                   size => "t2.micro",
                                   storageGB => "ebsonly",
                                   valueColumns => [{ name => "os", prices => { USD => 0.023 } }],
                                   vCPU => 1,
                                 },
                                 {
ikegami
  • 367,544
  • 15
  • 269
  • 518
clt60
  • 62,119
  • 17
  • 107
  • 194
  • Thank you very much. I was not aware of `DWIW`. Your response answers my question. However, I will accept "ZDK's" answer as it saves me installing a new module. +1 for your time. – slayedbylucifer Jul 28 '14 at 18:23
  • @slayedbylucifer his answer is better, me just learned about the `allow_barekey`. thanx. – clt60 Jul 28 '14 at 18:26