5

I want to clear whole content that is placed inside of <loot> </loot> elements in XML files in a directory tree. I am using Strawberry Perl for windows 64 bit.

For example this XML file:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
<item id="1"/>
  <item id="3"/>
      <inside>
        <item id="6"/>
      </inside>
  </item>
</loot>

The changed file should look:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
</loot>

I have this code:

#!/usr/bin/perl
use warnings;
use strict;

use File::Find::Rule;
use XML::Twig;

sub delete_loot {
   my ( $twig, $loot ) = @_;
   foreach my $loot_entry ( $loot -> children ) {
      $loot_entry -> delete;
   }
   $twig -> flush;
}

my $twig = XML::Twig -> new ( pretty_print => 'indented', 
                              twig_handlers => { 'loot' => \&delete_loot } ); 

foreach my $file ( File::Find::Rule  -> file()
                                     -> name ( '*.xml' )
                                     -> in ( 'C:\Users\PIO\Documents\serv\monsters' ) ) {

    print "Processing $file\n";
    $twig -> parsefile_inplace($file); 
}

But it edits correctly only the first file it meets and the rest files leaves clear (0 kb clear files)

Piodo
  • 616
  • 4
  • 20
  • Can you add another file where it's not working to the question please? You can [edit] the question to do that. – simbabque Jan 02 '17 at 12:05
  • all the files are correct, but the script works well only on the first one it meets, leaving rest cleared (no matter which xml file it edits, it only edits correctly the first one) – Piodo Jan 02 '17 at 22:51
  • The obvious test there would be - move the `my $twig` declaration inside the loop. – Sobrique Jan 03 '17 at 09:02
  • Also: Your XML isn't valid. That's possibly not helping. – Sobrique Jan 03 '17 at 09:06

2 Answers2

3

The XML::Twig doc says that "Multiple twigs are not well supported".

If you look at the state of the twig object (using Data::Dumper for example) you see a strong difference between the first and subsequent runs. It looks like it considers that is has been totally flushed already (which is true, as there was a complete flush during the first run). It probably has nothing more to print for the subsequent files and the file ends up empty.

Recreating the twig object at each loop worked for me:

#!/usr/bin/perl
use warnings;
use strict;

use File::Find::Rule;
use XML::Twig;

sub delete_loot {
   my ( $twig, $loot ) = @_;
   foreach my $loot_entry ( $loot -> children ) {
        $loot_entry -> delete;
    }
}

foreach my $file ( File::Find::Rule  -> file()
                                     -> name ( '*.xml' )
                                     -> in ( '/home/dabi/tmp' ) ) {

    print "Processing $file\n";
    my $twig = XML::Twig -> new ( pretty_print => 'indented', 
                                  twig_handlers => { loot => \&delete_loot, } ); 
    $twig -> parsefile($file); 
    $twig -> print_to_file($file);
}

Also, I had to change the XML file structure to have it processed:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon">
<health value="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
<item id="1"/>
  <item id="3">
      <inside>
        <item id="6"/>
      </inside>
  </item>
</loot>
</monster>
clearlight
  • 12,255
  • 11
  • 57
  • 75
David Verdin
  • 480
  • 6
  • 18
  • The script works on every file correctly clearing the loot, I think we have a winner here. Unfortunately 10% of the xml files doesn't contain the `` `` elements. In this case if script modify xml monster that doesn't have `` node it clears the file (0 kb). Can be placed a condition that doesn't modify the file if there aren't `loot` elements, or just doesnt blank the file in this case? (Putting empty `` would be fine too) – Piodo Jan 05 '17 at 13:22
  • 1
    Indeed. It is because you use flush() while parsing. The doc explains it: "Flushes a twig up to (and including) the current element, then deletes all unnecessary elements from the tree that's kept in memory." As your files without the loot element won't match anything in your twig-handlers, when flushing you won't have been anywhere in the XML tree. I edited my solution in order to print the whole tree once the parsing is done. Please let me know if you agree with this solution. – David Verdin Jan 05 '17 at 15:42
  • Thank you, it's great. I will award my bounty to you as fast I can (after 6 hours) – Piodo Jan 05 '17 at 17:28
1

Note   With flush changed to print the code in the question works for me (with valid XML).

However, I still recommend either of versions below. Tested with two groups of valid XML files.


When XML::Twig->new(...) is set first and then files looped over and processed, I get the same behavior. The first file is processed correctly, the others completely blanked.   Edit When flush is replaced by print the shown code in fact works (with correct XML files). However I still suggest either of versions below instead, as XML::Twig just does not support multiple files well.

The reason may have something to do with new being a class method. However, I don't see why this needs to affect handling of multiple files. The callback is installed outside of the loop, but I've tested with it being re-installed for each file and it doesn't help.

Finally, flush-ing isn't needed while it clearly hurts here, by clearing the state (which was created by the class method new). This doesn't affect the code below, but it is still replaced by print.

Then just do everything in the loop. A simple version

use strict;
use warnings;
use File::Find::Rule;
use XML::Twig;

my @files = File::Find::Rule->file->name('*.xml')->in('...');

foreach my $file (@files)
{
    print "Processing $file\n";
    my $t = XML::Twig->new( 
        pretty_print => 'indented', 
        twig_handlers => { loot => \&clear_elt },
    );
    $t->parsefile_inplace($file)->print;
}

sub clear_elt {
    my ($t, $elt) = @_; 
    my $elt_name = $elt->name;                # get the name
    my $parent = $elt->parent;                # fetch the parent
    $elt->delete;                             # remove altogether
    $parent->insert_new_elt($elt_name, '');   # add it back empty
}

The callback code is simplified, to remove the element altogether and then add it back, empty. Note that the sub does not need the element name hardcoded. This can thus be used as it stands to remove any element.

We can avoid calling new in the loop by using another class method, nparse.

my $t = XML::Twig->new( pretty_print => 'indented' );

foreach my $file (@files) 
{
    print "Processing $file\n";
    my $tobj = XML::Twig->nparse( 
        twig_handlers => { loot => \&clear_elt }, 
        $file
     );
     $tobj->parsefile_inplace($file)->print;
}

# the sub clear_elt() same as above

We do have to first call the new constructor, even as it isn't directly used in the loop.


Note that calling new before the loop without twig_handlers and then setting handlers inside

$t->setTwigHandlers(loot => sub { ... });

does not help. We still only get the first file processed correctly.

zdim
  • 64,580
  • 5
  • 52
  • 81
  • Thanks for response. Unfortunately those scripts cleans all the files (Every files, even the first one) – Piodo Jan 05 '17 at 12:39
  • 1
    @Piodo The XML file you show is invalid and the shown code doesn't work for it, so you probably use files different than shown. I corrected it and tested with that, and I made up two more groups of XML files and tested with those as well. The code as shown works, both versions. It also works with your sub for clearing `loot` nodes. I added a different way just so, since it is far simpler computationally. – zdim Jan 05 '17 at 19:47
  • 1
    @Piodo I also replaced `flush` with `print`. That could be causing you problems (it doesn't for either version here, but it does clear the object). – zdim Jan 05 '17 at 22:36
  • 1
    @Piodo Confirmed -- when I change `flush` to `print` your code works (for me, and with valid XML files). I updated the answer with this. However, I still recommend doing everything in the loop, since XML::Twig just does not support multiple files well. – zdim Jan 05 '17 at 22:46