I have an XML file like this containing thousands of entries
<mediawiki>
<page>
<title>page1</title>
<revision>
<id>2621</id>
<parentid>6</parentid>
<timestamp>2005-10-09T01:00:18Z</timestamp>
<contributor>
<username>Chaos</username>
<id>2</id>
</contributor>
<model>wikitext</model>
<format>text/x-wiki</format>
<text xml:space="preserve">text1</text>
</revision>
</page>
<page>
<title>page2</title>
<ns>8</ns>
<id>7</id>
<revision>
<id>2619</id>
<parentid>2618</parentid>
<timestamp>2005-10-09T00:56:39Z</timestamp>
<contributor>
<username>Chaos</username>
<id>2</id>
</contributor>
<model>wikitext</model>
<format>text/x-wiki</format>
<text xml:space="preserve">text2</text>
</revision>
</page>
<page>
<title>page3</title>
<ns>8</ns>
<id>6</id>
<revision>
<id>2621</id>
<parentid>6</parentid>
<timestamp>2005-10-09T01:00:18Z</timestamp>
<contributor>
<username>Chaos</username>
<id>2</id>
</contributor>
<model>wikitext</model>
<format>text/x-wiki</format>
<text xml:space="preserve">text3</text>
</revision>
</page>
</mediawiki>
through my script, Each page must be in a text file whose name is the contents of the tag <title>
and contains the text of <text xml:space="preserve"></text>
My code
my $filename = "pages.xml";
my $parser = XML::LibXML->new();
my $xmldoc = $parser->parse_file( $filename );
my $file;
foreach my $page ( $xmldoc->findnodes( '/mediawiki/page' ) ) {
foreach my $title ( $page->findnodes( '/mediawiki/page/title' ) ) {
foreach my $rev ( $page->findnodes( '/mediawiki/page/revision' ) ) {
foreach my $text ( $rev->findnodes( 'text/text()' ) ) {
$file = $title->to_literal();
my $newfile = "$file.txt";
open( my $out, '>:utf8', $newfile )
or die "Unable to open '$newfile' for write: $!";
my $texte = $text->data;
print $out "$text\n";
close $out;
}
}
}
}
the problem is that every constructed file contains the same text as the last tag <text xml:space="preserve"></text>