0

I have a package.xml file which has the following structure:-

<package name="com/avinash/foo1">
    <sourcefile name="bar1.java">
        <line no="1" mi="3"/>
        <line no="3" mi="2"/>
    </sourcefile>
    <sourcefile name="bar2.java">
        <line no="1" mi="5"/>
        <line no="6" mi="8"/>
        <line no="7" mi="3"/>
    </sourcefile>
</package>
<package name="com/avinash/foo2">
.
.
.
.
</package>

Using Perl, I have to delete all the line nodes for which no="1". I have found that splice can be used to delete nodes in xml. I have written the following code to do that:-

my $xmlFilePath = 'package.xml';
use XML::Simple;
my $xs = XML::Simple->new (ForceArray => 1);
my $ref = $xs->XMLin($xmlFilePath);

foreach(@{$ref->{'package'}}) {
    my %packageTag = %{$_};        

    foreach(@{$packageTag{'sourcefile'}}){
        my %sourcefileTag = %{$_};

        my $lineCtr = 0;

        foreach(@{$sourcefileTag{'line'}}){
            my %lineTag = %{$_};

            if($lineTag{'no'}==1){
                #splice : something like "splice @{$ref{$packageTag{$sourcefileTag->{'line'}}}}, $lineCtr, 1;"
            }

            $lineCtr = $lineCtr + 1;

        }
    }
}

I am a newbie and very confused about @, %, $ conversion in Perl. I do not know how to write the array part (first argument) of the splice function. Can anyone please tell me what would be the splice function which would do the deletion of the line node?

Thanks in advance.

AvinashK
  • 3,309
  • 8
  • 43
  • 94
  • 4
    You're going to try to output XML using XML::Simple? Best of luck! [I'd use something else](http://stackoverflow.com/q/33267765/589924). – ikegami Oct 22 '15 at 18:44
  • 1
    @ikegami I was wondering how long it would take before someone linked to that. – Matt Jacob Oct 22 '15 at 18:57
  • TBH that's why I asked it - it's hard to explain my hatred of `XML::Simple` in a comment :). – Sobrique Oct 23 '15 at 08:58

3 Answers3

1

I'll second the recommendation to not use XML::Simple, but if you're going ahead some advice is below, since I think there are other issues to discuss anyway.

You can't splice inside a for/foreach, you'd be modifying the array you are looping over which causes all kinds of problems.

To filter a list you should be using grep from outside of it.

Also, your example file does not work for me. I need to add more tags to the XML file (the XML declaration node and a containing root node) or XML::Simple complains.

And finally, the name attribute is special (yet another reason to not use XML::Simple). You need to supply the KeyAttr setting to stop it folding your data up.

Try the below.

use XML::Simple;
my $xs = XML::Simple->new (ForceArray => 1, KeyAttr => []);
my $packages = $xs->XMLin('package.xml');

for my $package (@{$packages->{'package'}}) {
    for my $sourcefile ( @{$package->{'sourcefile'}} ) {
        my $lines = $sourcefile->{'line'};

        my @filtered = grep { $_->{'no'} != 1 } @{$lines};
        $sourcefile->{'line'} = \@filtered;
    }   
}
Nick P
  • 759
  • 5
  • 20
1

As an alternative to XML::Simple, here's a solution using XML::Twig which has the advantage of not loading the entire document into memory (useful if your input file is large) while remaining rather simple.

use XML::Twig;

my $twig = XML::Twig->new(
  twig_roots => {
    'package/sourcefile/line' => \&handle_line,
  },
  twig_print_outside_roots => 1,
);

sub handle_line {
  my ($twig, $line) = @_;
  $line->print unless $line->att('no') == 1;
} 

$twig->parsefile('package.xml');

Yep, it's that easy. twig_print_outside_roots says that anything that isn't a line element inside a sourcefile inside a package should be printed to the output without any processing, while those line elements should be passed to the handle_line sub for processing. handle_line simply checks if the element's no attribute is 1, and prints the element only if it isn't.

This reads from package.xml and prints to standard output, which you can redirect to a new file. Or you can modify it to print to a file directly by opening the file yourself, and passing the filehandle to both twig_print_outside_roots and the print method.

hobbs
  • 223,387
  • 19
  • 210
  • 288
  • That will load it all into memory. You will need to call `purge` in your handler if you don't want to. (Which is indeed extremely useful for large XML). – Sobrique Oct 23 '15 at 08:59
  • @Sobrique that's true with `twig_handlers`, not true with `twig_roots`. – hobbs Oct 23 '15 at 14:41
  • Ah, ok. I didn't know that. That's useful. – Sobrique Oct 23 '15 at 14:43
  • @Sobrique `twig_roots` doesn't parse into memory anything outside of a root (it just keeps track of enough context to tell whether it matched or not). The price you pay is that you can't look outside of the root with element methods at all. – hobbs Oct 23 '15 at 14:58
0

Deleting nodes using XML::Twig:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig -> new ( 'pretty_print' => 'indented', 
                              'twig_handlers' => { 
                                   'line[@no="1"]' => sub { $_ -> delete } } );
   $twig -> parsefile ( 'your_file');
   $twig -> print;

You can use parsefile_inplace with XML::Twig to do this too:

my $twig = XML::Twig -> new ( 'pretty_print' => 'indented', 
                              'twig_handlers' => { 'line[@no="1"]' => sub { $_ -> delete } } );
   $twig -> parsefile_inplace ( 'your_file');

Or you can simply manipulate your parsed XML:

my $twig = XML::Twig->new( 'pretty_print' => 'indented' );
$twig->parsefile ('your_file'); 
foreach my $line ( $twig->get_xpath('//line') ) {
    if ( $line->att("no") eq "1" ) {
        $line->delete;
    }
}
$twig->print;
Sobrique
  • 52,974
  • 7
  • 60
  • 101