-1

I am failing terribly to return a Hash of the Parsed XML document using twig - in order to use it in OTHER subs for performing several validation checks. The goal is to do abstraction and create re-usable blocks of code.

XML Block:

<?xml version="1.0" encoding="utf-8"?>
<Accounts locale="en_US">
  <Account>
    <Id>abcd</Id>
    <OwnerLastName>asd</OwnerLastName>
    <OwnerFirstName>zxc</OwnerFirstName>
    <Locked>false</Locked>
    <Database>mail</Database>
    <Customer>mail</Customer>
    <CreationDate year="2011" month="8" month-name="fevrier" day-of-month="19" hour-of-day="15" minute="23" day-name="dimanche"/>
    <LastLoginDate year="2015" month="04" month-name="avril" day-of-month="22" hour-of-day="11" minute="13" day-name="macredi"/>
    <LoginsCount>10405</LoginsCount>
    <Locale>nl</Locale>
    <Country>NL</Country>
    <SubscriptionType>free</SubscriptionType>
    <ActiveSubscriptionType>free</ActiveSubscriptionType>
    <SubscriptionExpiration year="1980" month="1" month-name="janvier" day-of-month="1" hour-of-day="0" minute="0" day-name="jeudi"/>
    <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
    <PaymentMode>Undefined</PaymentMode>
    <Provision>0</Provision>
    <InternalMail>asdf@asdf.com</InternalMail>
    <ExternalMail>fdsa@zxczxc.com</ExternalMail>
    <GroupMemberships>
      <Group>werkgroep X.Y.Z.</Group>
    </GroupMemberships>
    <SynchroCount>6</SynchroCount>
    <LastSynchroDate year="2003" month="12" month-name="decembre" day-of-month="5" hour-of-day="12" minute="48" day-name="mardi"/>
    <HasActiveSync>false</HasActiveSync>
    <Company/>
  </Account>
  <Account>
    <Id>mnbv</Id>
    <OwnerLastName>cvbb</OwnerLastName>
    <OwnerFirstName>bvcc</OwnerFirstName>
    <Locked>true</Locked>
    <Database>mail</Database>
    <Customer>mail</Customer>
    <CreationDate year="2012" month="10" month-name="octobre" day-of-month="10" hour-of-day="10" minute="18" day-name="jeudi"/>
    <LastLoginDate/>
    <LoginsCount>0</LoginsCount>
    <Locale>fr</Locale>
    <Country>BE</Country>
    <SubscriptionType>free</SubscriptionType>
    <ActiveSubscriptionType>free</ActiveSubscriptionType>
    <SubscriptionExpiration year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
    <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
    <PaymentMode>Undefined</PaymentMode>
    <Provision>0</Provision>
    <InternalMail/>
    <ExternalMail>qweqwe@qwe.com</ExternalMail>
    <GroupMemberships/>
    <SynchroCount>0</SynchroCount>
    <LastSynchroDate year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
    <HasActiveSync>false</HasActiveSync>
    <Company/>
  </Account>
</Accounts>

Perl Block:

my $file = shift || (print "NOTE: \tYou didn't provide the name of the file to be checked.\n" and exit);
my $twig = XML::Twig -> new ( twig_roots => { 'Account' => \& parsing } ); #'twig_roots' mode builds only the required sub-trees from the document while ignoring everything outside that twig.
$twig -> parsefile ($file);

sub parsing {
    my ( $twig, $accounts ) = @_;
    my %hash = @_;
    my $ref = \%hash; #because was getting an error of Odd number of hash elements
    return $ref;
    $twig -> purge;

It gives a hash reference - which I'm unable to deference properly (even after doing thousands of attempts).

Again - just need a single clean function (sub) for doing the Parsing and returning the hash of all elements ('Accounts' in this case) - to be used in other other function (valid_sub) for performing the validation checks.

I'm literally stuck at this point - and will HIGHLY appreciate your HELP.

MSalman
  • 13
  • 5
  • 1
    http://www.perlmonks.org/?node_id=1150626 – choroba Dec 17 '15 at 16:37
  • I would suggest - don't, it's really nasty. Use the `XML::Twig::Elt` objects – Sobrique Dec 18 '15 at 10:08
  • @Sobrique Thanks for your suggestion - and I would really appreciate, if you can also provide a working example (XML::Twig::Elt under my question) as well (a basic simple one - just to grab the concept in a proper way). – MSalman Dec 18 '15 at 15:55
  • These are the things that `XML::Twig` generates and passes to your handler e.g. `$accounts`. – Sobrique Dec 18 '15 at 16:04
  • @Sobrique How can i use them in other subs for doing various operations over their values ? – MSalman Dec 18 '15 at 16:06
  • Pass the reference to the subroutine, use `XML::Twig` methods on it. e.g. `validate_field ( $accounts );` – Sobrique Dec 18 '15 at 16:09
  • @Sobrique Thanks alot for always coming to the rescue, though I opted to use the other way, your answers are always super helpful and really allowed me to expand my Perl knowledge-sphere. – MSalman Jan 04 '16 at 17:36

2 Answers2

1

Such a hash is not created by Twig, you have to create it yourself.

Beware: Commands after return will never be reached.

#!/usr/bin/perl
use warnings;
use strict;

use XML::Twig;
use Data::Dumper;

my $twig = 'XML::Twig'->new(twig_roots => { Account => \&account });
$twig->parsefile(shift);

sub account {
    my ($twig, $account) = @_;
    my %hash;
    for my $ch ($account->children) {
        if (my $text = $ch->text) {
            $hash{ $ch->name } = $text;
        } else {
            for my $attr (keys %{ $ch->atts }) {
                $hash{ $ch->name }{$attr} = $ch->atts->{$attr};
            }
        }
    }
    print Dumper \%hash;
    $twig -> purge;
    validate(\%hash);
}

Handling of nested elements (e.g. GroupMemberships) left as an exercise to the reader.

And for validation:

sub validate {
    my $account = shift;
    if ('abcd' eq $account->{Id}) {
        ...
    }
}
choroba
  • 231,213
  • 25
  • 204
  • 289
  • Thanks a lot for your answer - the question now is how to use this hash (along with its values) in another sub for performing the validation checks (for eg if Id= ? else ?) ? – MSalman Dec 17 '15 at 17:00
  • Do you mean `if ($hash->{Id} eq "abcd") {...}`? – choroba Dec 17 '15 at 17:06
  • Are you asking how to pass a hash reference to a subroutine? – choroba Dec 17 '15 at 17:17
  • Yes - a 'Hash reference - the one in your answer' into ANOTHER sub routine (sub check_valid) for performing the checks like if ($hash->{Id} eq "abcd") {...}? – MSalman Dec 17 '15 at 17:21
  • Thanks - My script is compete. The only thing which I'm still struggling with is handling the nested elements even after doing tons of attempts. As of now its looks this `elsif ( $innrtext = $account->children->{$ch} ) { foreach my $b ( { $account->children->{$ch}->text } ){ $hash{ $b->name } = $innrtext; ` which is doing nothing....and I'm continuously struggling – MSalman Dec 18 '15 at 18:02
  • `if ($ch->children && ($ch->children)[0]->name ne '#PCDATA') { foreach my $b ( $ch->children ) { push @{ $hash{ $ch->name }{ $b->name } }, $b->text; }` works for me, but I needed to reshuffle the conditions. In your attempt, there are some weird constructs, read more about references in Perl, e.g. in [perlreftut](http://p3rl.org/perlreftut). – choroba Dec 18 '15 at 21:32
  • Though there are some glitches - but it worked fine. Thanks alot ! – MSalman Jan 04 '16 at 17:31
1

The problem with downconverting XML into hashes, is that XML is fundamentally a more complicated data structure. Each element has properties, children and content - and it's ordered - where hashes... don't.

So I would suggest that you not do what you're doing, and instead of passing a hash, use an XML::Twig::Elt and pass that into your validation.

Fortunately, this is exactly what XML::Twig passes to it's handlers:

## this is fine:
sub parsing {
    my ( $twig, $accounts ) = @_;

but this is nonsense - think about what's in @_ at this point - it's references to XML::Twig objects - two of them, you've just assigned them.

    my %hash = @_;

And this doesn't makes sense as a result

    my $ref = \%hash; #because was getting an error of Odd number of hash elements

And where are you returning it to? (this is being called when XML::Twig is parsing)

    return $ref;
    #this doesn't happen, you've already returned
    $twig -> purge;

But bear in mind - you're returning it to your twig proces that's parsing, that's ... discarding the return code. So that's not going to do anything anyway.

I would suggest instead you 'save' the $accounts reference and use that for your validation - just pass it into your subroutines to validate.

Or better yet, configure up a set of twig_handlers that do this for you:

my %validate = ( 'Account/Locked' => sub { die if $_ -> trimmed_text eq "true" },
                 'Account/CreationDate' => \&parsing, 
                 'Account/ExternalMail' => sub { die unless $_ -> text =~ m/\w+\@\w+\.\w+ } 
               );


my $twig = XML::Twig -> new ( twig_roots => \%validate );

You can either die if you want to discard the whole lot, or use things like cut to remove an invalid entry from a document as you parse. (and maybe paste it into a seperate doc).

But if you really must turn your XML into a perl data structure - first read this for why it's a terrible idea:

Why is XML::Simple "Discouraged"?

And then, if you really want to carry on down that road, look at the simplify option of XML::Twig:

sub parsing {
    my ( $twig, $accounts ) = @_;
    my $horrible_hacky_hashref = $accounts->simplify(forcearray => 1, keyattr => [], forcecontent => 1  );
    print Dumper \$horrible_hacky_hashref;
    $twig -> purge;
    #do something with it. 
}

Edit:

To expand:

XML::Twig::Elt is a subset of XML::Twig - it's the 'building block' of an XML::Twig data structure - so in your example above, $accounts is.

sub parsing {
    my ( $twig, $accounts ) = @_;
    print Dumper $accounts;
}

You will get a lot of data if you do this, because you're dumping the whole data structure - which is effectively a daisy chain of XML::Twig::Elt objects.

$VAR1 = \bless( {
                   'parent' => bless( {
                                        'first_child' => ${$VAR1},
                                        'flushed' => 1,
                                        'att' => {
                                                   'locale' => 'en_US'
                                                 },
                                        'gi' => 6,

....

                    'att' => {},
               'last_child' => ${$VAR1}->{'first_child'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'},
               'gi' => 7
             }, 'XML::Twig::Elt' );

But it already encapsulates the information you need, as well as the structure you require - that's why XML::Twig is using it. And is in no small part going to illustrate why forcing your data into a hash/array, you're going to lose data.

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • First of all, Thank You for outlining and explaining the logical errors in my Perl code - it really helped. Now why I just want a Hash (of the parsed doc) is because its a condition (put forward by my SysTeam) - though I completely agree that its not a clean thing to do (and have also read the post which you have pointed in your answer). However, I'm much interested in "use an XML::Twig::Elt and pass that into your validation" but unable to grab the concept here - if you could provide an example (under the case of my Question - that will surely assist me up). – MSalman Dec 18 '15 at 15:52