How can I parse XML using Perl?

Question

I have a file which has

<Doc>
<Text>
....
</Text>
</Doc>
<Doc>
<Text>
</Text>
</Doc>

How do I extract only the <text> elements, process them and then extract the next text element efficiently?

I do not know how many I have in a file?

Take a look at http://stackoverflow.com/questions/487213/whats-the-best-xml-parser-for-perl for another Perl xml parser answer. — Robert P, Oct 23 '09 at 23:48

score 8 · Answer 1 · answered Oct 23 '09 at 23:16

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $t = XML::Twig->new(
    twig_roots  => {
        'Doc/Text' => \&print_n_purge,
});

$t->parse(\*DATA);

sub print_n_purge {
    my( $t, $elt)= @_;
    print $elt->text;
    $t->purge;
}

__DATA__
<xml>
<Doc>
<Text>
....
</Text>
</Doc>
<Doc>
<Text>
</Text>
</Doc>
</xml>

Ivan Nevostruev · Accepted Answer · 2009-10-23T23:24:48.567

7

XML::Simple can do this easily:

## make sure that there is some kind of <root> tag
my $xml_string = "<root><Doc>...</Doc></root>";

my $xml = XML::Simple->new();
$data = $xml->XMLin($xml_string);

for my $text_node (@{ $data->{'Doc'} }) {
    print $text_node->{'Text'},"\n"; ## prints value of Text nodes
}

edited Oct 23 '09 at 23:24

answered Oct 23 '09 at 23:17

Ivan Nevostruev

28,143
8
66
82

What if I didn't know how many I had in a file, how would I use it? Thanks. – unj2 Oct 23 '09 at 23:20
And I get a mismatched tag error...do you know what that means? – unj2 Oct 23 '09 at 23:27
use Data::Dumper; print Dumper($data); – Daren Schwenke Oct 23 '09 at 23:28
@kunjaan: your xml is not valid. You can save it to file and open in IE for example to see if it's valid – Ivan Nevostruev Oct 23 '09 at 23:32

How can I parse XML using Perl?

2 Answers2