1

I am trying to extract the values from this xml file but can't seem to do it...

<?xml version="1.0" encoding="UTF-8"?>
<personReport xmlns="...">          
    <header>
        <creation>2016-10-15</creation>
    </header>
    <details>
    ...
    </details>
    <person id="person1">
        <personId personIdScheme="name of">joe</personId>
    </person>
    <person id="person2">
        <personId personIdScheme="name of">sam</personId>
    </person>
</personReport>

I am successfully able to extract data from other tags (such as header) using:

my $xml = XMLin($xml_file);
my $header = $xml->{header}
                  ->{creation};

I am trying to do the same thing but get the data (joe) out of <person>...

my $person_type = $xml->{personReport}
                      ->{person1}[1];

Any idea why this isn't working?

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
j1nrg
  • 116
  • 14
  • 1
    As the author of XML::Simple my advice is don't use XML::Simple. In fact I even put together [Perl XML::LibXML by Example](http://grantm.github.io/perl-libxml-by-example/) to help people solve their XML problems using libxml. – Grant McLean Oct 18 '16 at 20:17

2 Answers2

1
$xml->{personReport}{person1}[1]

should be

$xml->{person}{person1}{personId}{content}

If you don't understand why, perhaps you shouldn't be using a module so complex that its author discourages its use.

STATUS OF THIS MODULE

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended and XML::Twig is an excellent alternative.


Finding the name of each person using XML::Simple:

# Assumes each person element will have at least one personId child.
# Assumes each personId element will have a personIdScheme attribute.

for my $person (values(%{ $xml->{person} })) {
   my @data_nodes ref($person->{personIdScheme}) eq 'ARRAY'
      ? @{ $person->{personIdScheme} }
      : $person->{personIdScheme};

   my ($name_data_node) = grep { $_->{personIdScheme} eq 'name' } @data_nodes;

   my $name = $name_data_node->{content};
   ...
}

Finding the name of each person using XML::LibXML:

for my $person_node ($doc->findnodes('/personReport/person')) {
   my $name = $doc->findvalue('personId[@personIdScheme="name of"]', $person_node);
   ...
}
Community
  • 1
  • 1
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thank you so much for your response. I found that another way to look at this was to use Dumper to output the file content. From there I was able to derive this same response! – j1nrg Oct 18 '16 at 20:20
  • Dumper doesn't actually help since the structure changes depending on which elements are present. What you see for one XML is not the same you're going to see for another XML of the same format. Please read the document I linked in my answer! – ikegami Oct 18 '16 at 20:23
1

Pretty much any XML module is superior to XML::Simple, which is anything but simple in use

XML::LibXML and XML::Twig are excellent and popular, and both allow you to address the XML document using XPath expressions. Here's a solution using XML::Twig

use strict;
use warnings 'all';
use feature 'say';

use XML::Twig;

my $twig = XML::Twig->new;
$twig->parsefile( 'personReport.xml' );

say $twig->findvalue('/personReport/header/creation');

for my $person ($twig->findnodes('/personReport/person') ) {

    my $id = $person->att('id');
    my $name = $person->findvalue('personId[@personIdScheme="name of"]');

    say "$id $name";
}

output

2016-10-15
person1 joe
person2 sam
Borodin
  • 126,100
  • 9
  • 70
  • 144