0

This is my XML data

<categories>
    <category id="Id001" name="Abcd">
        <project> ID_1234</project>
        <project> ID_5678</project>
    </category>
    <category id="Id002" name="efgh">
        <project> ID_6756</project>
        <project> ID_4356</project>
    </category>
</categories>

I need to get the text contents of each <project> element based on the name attribute of the containing <category> element.

I am using Perl with the XML::LibXML module.

For example, given category name Abcd i should get the list ID_1234, ID_5678.

Here is my code

my $parser = XML::LibXML->new;

$doc = $parser->parse_file( "/cctest/categories.xml" );

my @nodes = $doc->findnodes( '/categories/category' );

foreach my $cat ( @nodes ) {
    my @catn = $cat->findvalue('@name');
} 

This gives me the category names in array @catn. But how can I get the text values of each project?

Borodin
  • 126,100
  • 9
  • 70
  • 144
Ras
  • 53
  • 6
  • 2
    Which parser are you using? – simbabque Oct 07 '16 at 09:13
  • Hi, i am using XML::LibXML to parse the xml file. – Ras Oct 07 '16 at 09:18
  • And what output are you seeking specifically? And what do you have already, code wise? – Sobrique Oct 07 '16 at 09:19
  • Please include the code you've already written and explain where you are struggling. You can [edit] your question with the [edit] link. See [ask] for more information on how to use the site. – simbabque Oct 07 '16 at 09:19
  • my $parser = XML::LibXML->new; $doc = $parser->parse_file("/cctest/categories.xml"); my @nodes = $doc->findnodes('/categories/category'); foreach my $cat(@nodes){ – Ras Oct 07 '16 at 09:23
  • [edit] that into your question. This is not a forum - comments are for clarifications/queries not additional content relevant to your question. – Sobrique Oct 07 '16 at 09:24

3 Answers3

3

You haven't shown what you've tried so far, or what your desired output is so I've made a guess at what you're looking for.

With XML::Twig you could do something like this:

#!/usr/bin/env perl

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig -> parse ( \*DATA );

foreach my $project ( $twig -> findnodes ( '//project' ) ) { 
    print join ",",  (map { $project -> parent -> att($_) } qw ( id name )), $project -> text,"\n"; 
}

__DATA__
<categories>
<category id="Id001" name="Abcd">
   <project> ID_1234</project>
   <project> ID_5678</project>
</category>
<category id="Id002" name="efgh">
   <project> ID_6756</project>
   <project> ID_4356</project>
</category>
</categories>

Which produces:

Id001,Abcd, ID_1234,
Id001,Abcd, ID_5678,
Id002,efgh, ID_6756,
Id002,efgh, ID_4356,

It does this by using findnodes to locate any element 'project'.

Then extract the 'id' and 'name' attributes from the parent (the category), and print that - along with the text in this particular element.

xpath is a powerful tool for selecting data from XML, and with a more focussed question, we can give more specific answers.

So if you were seeking all the projects 'beneath' category "Abcd" you could:

foreach my $project ( $twig -> findnodes ( './category[@name="Abcd"]/project' ) ) { 
    print $project -> text,"\n";
}
Sobrique
  • 52,974
  • 7
  • 60
  • 101
0

This uses XML::LibXML, which is the library you're already using.

Your $cat variable contains an XML element object which you can process with the same findnodes() and findvalue() methods that you used on the top-level $doc object.

#!/usr/bin/perl

use strict;
use warnings;
# We use modern Perl here (specifically say())
use 5.010;

use XML::LibXML;

my $doc = XML::LibXML->new->parse_file('categories.xml');

foreach my $cat ($doc->findnodes('//category')) {
  say $cat->findvalue('@name');
  foreach my $proj ($cat->findnodes('project')) {
    say $proj->findvalue('.');
  }
}
Dave Cross
  • 68,119
  • 3
  • 51
  • 97
  • @ Dave Cross: Thanks a lot. Even this provided code helped me to meet my requirement. – Ras Oct 13 '16 at 05:40
-1

You can try with XML::Simple

use strict;
use warnings;
use XML::Simple;
use Data::Dumper

my $XML_file  = 'your XML file';
my $XML_data;
#Get data from your XML file
open(my $IN, '<:encoding(UTF-8)', $XML_file) or die "cannot open file $XML_file";
{
   local $/;
   $XML_data = <$IN>;
}
close($IN);
#Store XML data as hash reference
my $xmlSimple = XML::Simple->new(KeepRoot   => 1);
my $hash_ref = $xmlSimple->XMLin($XML_data);
print Dumper $hash_ref;

The hash reference will be as below:

$VAR1 = {
          'categories' => {
                          'category' => {
                                        'efgh' => {
                                                  'id' => 'Id002',
                                                  'project' => [
                                                               ' ID_6756',
                                                               ' ID_4356'
                                                             ]
                                                },
                                        'Abcd' => {
                                                  'id' => 'Id001',
                                                  'project' => [
                                                               ' ID_1234',
                                                               ' ID_5678'
                                                             ]
                                                }
                                      }
                        }
        };

Now to get data which you want:

foreach(@{$hash_ref->{'categories'}->{'category'}->{'Abcd'}->{'project'}}){
  print "$_\n";
}

The result is:

ID_1234
ID_5678
Ngoan Tran
  • 1,507
  • 1
  • 13
  • 17
  • 5
    You _could_ but [xml simple is "discouraged"](http://stackoverflow.com/questions/33267765/why-is-xmlsimple-discouraged) – Sobrique Oct 07 '16 at 09:37
  • 3
    Downvoted for a) recommending XML::Simple and b) using indirect object notation (`new XML::Simple`). – Dave Cross Oct 07 '16 at 09:52