1

So I'm writing a static validator for a particular kind of file. These files are expected to have a certain JSON-like object in it. Sample below

theObject = {
key1 = value1,
key2 = value2,
key3 = {
    key4 = value4,
    key5 = value5
}
}

There are a lot of other things in this file but this is required and I need to validate its presence and form. My current solution to this has been to find the 2nd } after theObject's name and extract out the guts of the object so I can stick it in a JSON parser. Obviously this fails if the creator fails to include key3 entirely.

I've been trying to tweak a regex to fit the form

m/theObject = (\{.*\})/

Obviously this doesn't work. Any ideas on how to match the corresponding closing bracket to the expected opening bracket?

Ryan
  • 599
  • 2
  • 5
  • 14

3 Answers3

1

Use a recursive regular expression to match balanced braces.

use strict;
use warnings;

use JSON;

my $data = do { local $/; <DATA> };

# Find theObject within your data:    
if ( $data =~ m/^theObject\s*=\s*(\{ (?: (?> [^{}]+ ) | (?1) )* \})/msx ) {
    my $hashref = from_json($1);

    print "Perl Data Structure:\n";
    use Data::Dump;
    dd $hashref;

} else {
    warn "Unable to find theObject";
}

__DATA__
theObject = {
   "key2" : "value2",
   "key1" : "value1",
   "key3" : {
      "key5" : "value5",
      "key4" : "value4"
   }
}

Outputs:

Perl Data Structure:
{
  key1 => "value1",
  key2 => "value2",
  key3 => { key4 => "value4", key5 => "value5" },
}
Miller
  • 34,962
  • 4
  • 39
  • 60
0

Looks like a simple case of nested braces matching... which in Perl is done with recursive regexes. But in this case you have to handle JSON strings too, as they may contain the } character:

(?(DEFINE)
    (?<json>
        \{ (?:
            [^{}"']+
            |(?<quote>["'])(?:[^\\"']+|\\.)*\k<quote>
            |(?&json)
        )* \}
    )
)

theObject\s*=\s*(?&json)

Use the x modifier to allow whitespace in the pattern.

Demo: http://regex101.com/r/jX2rS6/1

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
0

Using a proper parser is always an option too. So far what I can glean from your example is something like

Top       ::= KeyValue
KeyValue  ::= Key '=' Value
Key       ::= Name
Value     ::= Name 
            | Object
Object    ::= '{' KeyValues '}'
KeyValues ::= KeyValue+

Name ~ [\w]+

(this is valid input for Marpa::R2 if I haven't made any mistakes). But there's a lot I don't know, like whether you support other kinds of values, quoted strings maybe, what the whitespace rules are, exactly how "JSON-like" things are, and whether there is actually already a parser on CPAN good enough :)

hobbs
  • 223,387
  • 19
  • 210
  • 288