1

I have a perl hash that is obtained from parsing JSON. The JSON could be anything a user defined API could generated. The goal is to obtain a date/time string and determine if that date/time is out of bounds according to a user defined threshold. The only issue I have is that perl seems a bit cumbersome when dealing with hash key/subkey iteration. How can I look through all the keys and determine if a key or subkey exists throughout the hash? I have read many threads throughout stackoverflow, but nothing that exactly meets my needs. I only started perl last week so I may be missing something... Let me know if that's the case.

Below is the "relevant" code/subs. For all code see: https://gitlab.com/Jedimaster0/check_http_freshness

use warnings;
use strict;
use LWP::UserAgent;
use Getopt::Std;
use JSON::Parse 'parse_json';
use JSON::Parse 'assert_valid_json';
use DateTime;
use DateTime::Format::Strptime;

# Verify the content-type of the response is JSON
eval {
        assert_valid_json ($response->content);
};
if ( $@ ){
        print "[ERROR] Response isn't valid JSON. Please verify source data. \n$@";
        exit EXIT_UNKNOWN;
} else {
        # Convert the JSON data into a perl hashrefs
        $jsonDecoded = parse_json($response->content);
        if ($verbose){print "[SUCCESS] JSON FOUND -> ", $response->content , "\n";}

        if (defined $jsonDecoded->{$opts{K}}){
                if ($verbose){print "[SUCCESS] JSON KEY FOUND -> ", $opts{K}, ": ", $jsonDecoded->{$opts{K}}, "\n";}
                NAGIOS_STATUS(DATETIME_DIFFERENCE(DATETIME_LOOKUP($opts{F}, $jsonDecoded->{$opts{K}})));
        } else {
                print "[ERROR] Retreived JSON does not contain any data for the specified key: $opts{K}\n";
                exit EXIT_UNKNOWN;
        }
}




sub DATETIME_LOOKUP {
        my $dateFormat = $_[0];
        my $dateFromJSON = $_[1];

        my $strp = DateTime::Format::Strptime->new(
                pattern   => $dateFormat,
                time_zone => $opts{z},
                on_error  => sub { print "[ERROR] INVALID TIME FORMAT: $dateFormat OR TIME ZONE: $opts{z} \n$_[1] \n" ; HELP_MESSAGE(); exit EXIT_UNKNOWN; },
        );

        my $dt = $strp->parse_datetime($dateFromJSON);
        if (defined $dt){
                if ($verbose){print "[SUCCESS] Time formatted using -> $dateFormat\n", "[SUCCESS] JSON date converted -> $dt $opts{z}\n";}
                return $dt;
        } else {
                print "[ERROR] DATE VARIABLE IS NOT DEFINED. Pattern or timezone incorrect."; exit EXIT_UNKNOWN
        }
}




# Subtract JSON date/time from now and return delta
sub DATETIME_DIFFERENCE {
        my $dateInitial = $_[0];
        my $deltaDate;
        # Convert to UTC for standardization of computations and it's just easier to read when everything matches.
        $dateInitial->set_time_zone('UTC');

        $deltaDate = $dateNowUTC->delta_ms($dateInitial);
        if ($verbose){print "[SUCCESS] (NOW) $dateNowUTC UTC - (JSON DATE) $dateInitial ", $dateInitial->time_zone->short_name_for_datetime($dateInitial), " = ", $deltaDate->in_units($opts{u}), " $opts{u} \n";}

        return $deltaDate->in_units($opts{u});
}

Sample Data

{
  "localDate":"Wednesday 23rd November 2016 11:03:37 PM",
  "utcDate":"Wednesday 23rd November 2016 11:03:37 PM",
  "format":"l jS F Y h:i:s A",
  "returnType":"json",
  "timestamp":1479942217,
  "timezone":"UTC",
  "daylightSavingTime":false,
  "url":"http:\/\/www.convert-unix-time.com?t=1479942217",
  "subkey":{
    "altTimestamp":1479942217,
    "altSubkey":{
      "thirdTimestamp":1479942217
    }
  }
}

[SOLVED]

I have used the answer that @HåkonHægland provided. Here are the below code changes. Using the flatten module, I can use any input string that matches the JSON keys. I still have some work to do, but you can see the issue is resolved. Thanks @HåkonHægland.

use warnings;
use strict;
use Data::Dumper;
use LWP::UserAgent;
use Getopt::Std;
use JSON::Parse 'parse_json';
use JSON::Parse 'assert_valid_json';
use Hash::Flatten qw(:all);
use DateTime;
use DateTime::Format::Strptime;

# Verify the content-type of the response is JSON
eval {
        assert_valid_json ($response->content);
};
if ( $@ ){
        print "[ERROR] Response isn't valid JSON. Please verify source data. \n$@";
        exit EXIT_UNKNOWN;
} else {
        # Convert the JSON data into a perl hashrefs
        my $jsonDecoded = parse_json($response->content);
        my $flatHash = flatten($jsonDecoded);

        if ($verbose){print "[SUCCESS] JSON FOUND -> ", Dumper($flatHash), "\n";}

        if (defined $flatHash->{$opts{K}}){
                if ($verbose){print "[SUCCESS] JSON KEY FOUND -> ", $opts{K}, ": ", $flatHash>{$opts{K}}, "\n";}
                NAGIOS_STATUS(DATETIME_DIFFERENCE(DATETIME_LOOKUP($opts{F}, $flatHash->{$opts{K}})));
        } else {
                print "[ERROR] Retreived JSON does not contain any data for the specified key: $opts{K}\n";
                exit EXIT_UNKNOWN;
        }
}

Example:

./check_http_freshness.pl -U http://bastion.mimir-tech.org/json.html -K result.creation_date -v
[SUCCESS] JSON FOUND -> $VAR1 = {
          'timestamp' => '20161122T200649',
          'result.data_version' => 'data_20161122T200649_data_news_topics',
          'result.source_version' => 'kg_release_20160509_r33',
          'result.seed_version' => 'seed_20161016',
          'success' => 1,
          'result.creation_date' => '20161122T200649',
          'result.data_id' => 'data_news_topics',
          'result.data_tgz_name' => 'data_news_topics_20161122T200649.tgz',
          'result.source_data_version' => 'seed_vtv: data_20161016T102932_seed_vtv',
          'result.data_digest' => '6b5bf1c2202d6f3983d62c275f689d51'
        };

Odd number of elements in anonymous hash at ./check_http_freshness.pl line 78, <DATA> line 1.
[SUCCESS] JSON KEY FOUND -> result.creation_date:
[SUCCESS] Time formatted using -> %Y%m%dT%H%M%S
[SUCCESS] JSON date converted -> 2016-11-22T20:06:49 UTC
[SUCCESS] (NOW) 2016-11-26T19:02:15 UTC - (JSON DATE) 2016-11-22T20:06:49 UTC = 94 hours
[CRITICAL] Delta hours (94) is >= (24) hours. Data is stale.
japtain.cack
  • 131
  • 11
  • can you show some of the more complicated structures you hope to deal with and which key(s) you are looking for in them? – ysth Nov 23 '16 at 22:46
  • I have added some example code. I could hard code it to look 2 or 3 "layers" deep, but the goal is to work on any output you throw at it. So it would need to simply iterate over any/all keys/subkeys and compare the key you defined with the keys/subkeys and obtain the data once matched. – japtain.cack Nov 23 '16 at 23:08
  • I mostly work with javascript and this kind of thing is a breeze in JS :/ – japtain.cack Nov 23 '16 at 23:09
  • 1
    so I see you added sample data; which key are you trying to find in that data? so far I really don't understand the problem. maybe show what you would do in javascript? it isn't going to be any harder to do in perl – ysth Nov 24 '16 at 03:23
  • If you want some feedback on your style, feel free to post this code (or the finished one) over on [codereview.se]. There are a bunch of things you can improve, but they would be way beyond the scope of an answer here. – simbabque Nov 24 '16 at 11:25
  • So that sample data could be in any format, not just the one shown here. The only constant is that it will be JSON data. One of the arguments is which key you are looking for. It's supposed to find that key in the JSON then grab the value for that key. So, I should be able to pass the key name for anything that contains a timestamp. – japtain.cack Nov 24 '16 at 13:33

2 Answers2

3

You could try use Hash::Flatten. For example:

use Hash::Flatten qw(flatten);

my $json_decoded = parse_json($json_str);
my $flat = flatten( $json_decoded );
say "found" if grep /(?:^|\.)\Q$key\E(?:\.?|$)/, keys %$flat;
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • I was considering using that method, but I generally frown upon converting objects into strings. I prefer to use the object and access their properties if possible. If I have to though, I'll do just that. Thanks for the suggestion. – japtain.cack Nov 24 '16 at 00:24
  • @Jedimaster0 A Perl hash has no properties unless you create them yourself. It is of course possible to write your own JSON parser if you need speed or are concerned about memory usage. Then you can insert any properties you like into the hash. I also noticed that `JSON::XS` and `JSON::PP` has filter methods that can call callbacks during parsing.. Maybe that is something that could be used? – Håkon Hægland Nov 24 '16 at 01:22
  • @Jedimaster0 Another alternative could be to copy the source code of `Hash::Flatten` and modify it such that it does not create all the unnecessary keys, and only does what you want. – Håkon Hægland Nov 24 '16 at 02:05
  • @Jedimaster0 The JSON format is used for complex data structures and I don't think it's right to expect a call within the language itself to traverse it. But there are tools, such as this one which seems tailor-made for what you ask. If you'd like to work directly with the JSON hashref for some reason, going through it is fairly simple in Perl -- and there is a number of modules for that, too. – zdim Nov 24 '16 at 02:54
  • Thanks for the info. I will look into what you mentioned and if I figure everything out, I'll mark this as answered. Thanks again. – japtain.cack Nov 24 '16 at 02:59
  • @Jedimaster0 I do not know Javascript, but I did a little research and found that Javascript does not have hashes but uses objects to simulate hashes. Is that correct? Perl can also encapsulate a hash inside an object, if desired. Further, it seems like Javascript "hash" objects need to be traversed recursively to visit all keys, just as in Perl. See for example [*"Traverse all the Nodes of a JSON Object Tree with JavaScript"*](http://stackoverflow.com/q/722668/2173773). This kind of recursive lookup is exactly what `Hash::Flatten` also does. – Håkon Hægland Nov 24 '16 at 09:59
  • @HåkonHægland a JS object is the same as a Perl hash. It's an associative array essentially. The part where you can have methods in JS objects is closer to putting coderefs into a Perl hash with more convenient syntax than blessed hashrefs in Perl. – simbabque Nov 24 '16 at 11:38
  • @simbabque From what I can see, Javascript has five primitive data types: number, string, boolean, undefined, and null. Anything that doesn’t belong to any of these five primitive types is considered an object. But Javascript also has object versions of the five primitive types. In this case, all members of an object is also itself an object, which will add overhead compared to a Perl hash, but on the other side enables easy traversal of complex structures. – Håkon Hægland Nov 24 '16 at 11:54
  • 1
    @HåkonHægland, Thanks for everything, I was able to reach a resolution. Much appreciated. – japtain.cack Nov 26 '16 at 19:52
2

You can use Data::Visitor::Callback to traverse the data structure. It lets you define callbacks for different kinds of data types inside your structure. Since we're only looking at a hash it's relatively simple.

The following program has a predefined list of keys to find (those would be user input in your case). I converted your example JSON to a Perl hashref and included it in the code because the conversion is not relevant. The program visits every hashref in this data structure (including the top level) and runs the callback.

Callbacks in Perl are code references. These can be created in two ways. We're doing the anonymous subroutine (sometimes called lambda function in other languages). The callback gets passed two arguments: the visitor object and the current data substructure.

We'll iterate all the keys we want to find and simply check if they exist in that current data structure. If we see one, we count it's existence in the %seen hash. Using a hash to store things we have seen is a common idiom in Perl.

We're using a postfix if here, which is convenient and easy to read. %seen is a hash, so we access the value behind the $key with $seen{$key}, while $data is a hash reference, so we use the dereferencing operator -> to access the value behind $key with $data->{$key}.

The callback needs us to return the $data again so it continues. The last line is just there, it's not important.

I've used Data::Printer to output the %seen hash because it's convenient. You can also use Data::Dumper if you want. In production, you will not need that.

use strict;
use warnings;
use Data::Printer;
use Data::Visitor::Callback;

my $from_json = {
    "localDate"  => "Wednesday 23rd November 2016 11:03:37 PM",
    "utcDate"    => "Wednesday 23rd November 2016 11:03:37 PM",
    "format"     => "l jS F Y h:i:s A",
    "returnType" => "json",
    "timestamp"  => 1479942217,
    "timezone"   => "UTC",
    "daylightSavingTime" =>
        0,    # this was false, I used 0 because that's a non-true value
    "url"    => "http:\/\/www.convert-unix-time.com?t=1479942217",
    "subkey" => {
        "altTimestamp" => 1479942217,
        "altSubkey"    => {
            "thirdTimestamp" => 1479942217
        }
    }
};

my @keys_to_find = qw(timestamp altTimestamp thirdTimestamp missingTimestamp);

my %seen;
my $visitor = Data::Visitor::Callback->new(
    hash => sub {
        my ( $visitor, $data ) = @_;

        foreach my $key (@keys_to_find) {
            $seen{$key}++ if exists $data->{$key};
        }

        return $data;
    },
);
$visitor->visit($from_json);

p %seen;

The program outputs the following. Note this is not a Perl data structure. Data::Printer is not a serializer, it's a tool to make data human readable in a convenient way.

{
    altTimestamp     1,
    thirdTimestamp   1,
    timestamp        1
}

Since you also wanted to constraint the input, here's an example how to do that. The following program is a modification of the one above. It allows to give a set of different constraints for every required key.

I've done that by using a dispatch table. Essentially, that's a hash that contains code references. Kind of like the callbacks we use for the Visitor.

The constraints I've included are doing some things with dates. An easy way to work with dates in Perl is the core module Time::Piece. There are lots of questions around here about various date things where Time::Piece is the answer.

I've only done one constraint per key, but you could easily include several checks in those code refs, or make a list of code refs and put them in an array ref (keys => [ sub(), sub(), sub() ]) and then iterate that later.

In the visitor callback we are now also keeping track of the keys that have %passed the constraints check. We're calling the coderef with $coderef->($arg). If a constraint check returns a true value, it gets noted in the hash.

use strict;
use warnings;
use Data::Printer;
use Data::Visitor::Callback;
use Time::Piece;
use Time::Seconds; # for ONE_DAY

my $from_json = { ... }; # same as above

# prepare one of the constraints
# where I'm from, Christmas eve is considered Christmas
my $christmas = Time::Piece->strptime('24 Dec 2016', '%d %b %Y');

# set up the constraints per required key
my %constraints = (
    timestamp        => sub {
        my ($epoch) = @_;
        # not older than one day

        return $epoch < time && $epoch > time - ONE_DAY;
    },
    altTimestamp     => sub {
        my ($epoch) = @_;
        # epoch value should be an even number

        return ! $epoch % 2;
    },
    thirdTimestamp   => sub {
        my ($epoch) = @_;
        # before Christmas 2016

        return $epoch < $christmas;
    },
);

my %seen;
my %passed;
my $visitor = Data::Visitor::Callback->new(
    hash => sub {
        my ( $visitor, $data ) = @_;

        foreach my $key (%constraints) {
            if ( exists $data->{$key} ) {
                $seen{$key}++;
                $passed{$key}++ if $constraints{$key}->( $data->{$key} );
            }
        }

        return $data;
    },
);
$visitor->visit($from_json);

p %passed;

The output this time is:

{
    thirdTimestamp   1,
    timestamp        1
}

If you want to learn more about the dispatch tables, take a look at chapter two of the book Higher Order Perl by Mark Jason Dominus which is legally available for free here.

simbabque
  • 53,749
  • 8
  • 73
  • 136
  • Nice answer! `Data::Visitor` seems to be a useful module. But it seems to use `Moose` which might be too heavyweight for this task? – Håkon Hægland Nov 24 '16 at 14:22
  • @H the times where Moose had this horrible startup penalty are long past. That shouldn't be a big problem any more. – simbabque Nov 24 '16 at 14:27