3

I have the following Perl code:

sub merge_xml {

  foreach my $repository ('repo1', 'repo2') {
    my @xml_files;

    sub match_xml {
      my $filename = $File::Find::dir . '/' . $_;
      if ($filename =~ m%/(main|test)\.xml$%) {
        push(@xml_files, $filename);
      }
    }

    find(\&match_xml, $repository);

    print Dumper(\@xml_files);
  }
}

And I am getting the warning:

Variable "@xml_files" will not stay shared at ./script/src/repair.pl line 87.

How does one fix this?

PS find as in File::Find

Ed Heal
  • 59,252
  • 17
  • 87
  • 127

3 Answers3

5

"Nested" named subs in fact aren't -- they're compiled as separate subroutines, and so having them written as "nested" can only be misleading.

Further, this creates a problem since the "inner" subroutine supposedly closes over the variable @xml_files that it uses, which gets redefined on each new call, being lexical. But the sub, built at compile-time and not being a lexical closure, only keeps the refernce to the value at first call and so it works right only upon the first call to the outer sub (merge_xml here).

We do get the warning though. With use diagnostics; (or see it in perldiag)

Variable "$x" will not stay shared at -e line 1 (#1)
(W closure) An inner (nested) named subroutine is referencing a lexical variable defined in an outer named subroutine.

When the inner subroutine is called, it will see the value of the outer subroutine's variable as it was before and during the first call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared.

This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs that reference variables in outer subroutines are created, they are automatically rebound to the current values of such variables.

So pull out that "inner" sub (match_xml) and use it normally from the "outer" one (merge_xml). In general you'd pass the reference to the array (@xml_files) to it; or, since in this case one cannot pass to File::Find's find, can have the array in such scope so to be seen as needed.

Or, since the purpose of match_xml is to be find's "wanted" function, can use an anonymous sub for that purpose so there is no need for a separate named sub

find( sub { ... }, @dirs );

Or store that coderef in a variable, as shown in Ed Heal's answer

my $wanted_coderef = sub { ... };
find( $wanted_coderef, @dirs );
zdim
  • 64,580
  • 5
  • 52
  • 81
  • I have a problem doing that as `find` is [File::Find](https://perldoc.perl.org/File::Find) – Ed Heal Dec 07 '20 at 00:40
  • 1
    @EdHeal OK -- but the `find` call will still see the sub, even if outside. Also, you can call `find` with a coderef -- `find( sub { ... }, @dirs );` so have the sub "inline" so to say. – zdim Dec 07 '20 at 00:45
  • 1
    @EdHeal Ah, right, you mean you can't pass the array -- of course not. So have it in the scope so that `find` sees it. Edited the answer, will add another way for hiding that array – zdim Dec 07 '20 at 00:47
  • 1
    @EdHeal Got to update, and in the meanwhile you beat me to the coderef -- the nicest way in my opinion, if one needs it separately. (Note that that is then a proper closure ... probably useless in this use though.) – zdim Dec 07 '20 at 01:39
  • See also [lexical subroutines](https://perldoc.perl.org/perlsub#Lexical-Subroutines) – Håkon Hægland Dec 07 '20 at 10:35
2

With help from zdim I came up with:

sub merge_xml {

  foreach my $repository ('repo1', 'repo2') {
    my @xml_files;

    my match_xml = sub {
      my $filename = $File::Find::dir . '/' . $_;
      if ($filename =~ m%/(main|test)\.xml$%) {
        push(@xml_files, $filename);
      }
    };

    find($match_xml, $repository);

    print Dumper(\@xml_files);
  }
}
Ed Heal
  • 59,252
  • 17
  • 87
  • 127
1

Might I suggest another alternative. By using a factory function, you can eliminate the need to hand write a find subroutine each time.

A factory is a function that generates another function (or subroutine in this case). You feed it some parameters and it creates a custom subroutine with those parameters bolted in. My example uses a closure but you could also build it with a string eval if the closure is costly for some reason.


sub match_factory {
  my ($filespec, $array) = @_;

  # Validate arguments
  die qq{File spec must be a regex, not "$filespec"\n}
    unless ref $filespec eq "Regexp";
  die qq{Second argument must be an array reference, not "$array"\n} 
    unless ref $array eq "ARRAY";

  # Generate an anonymous sub to perform the match check
  # that creates a closure on $filespec and $array
  my $matcher = sub {
    # Automatically compare against $_
    m/$filespec/ and
    push(@$array, $File::Find::name);
  };

  return $matcher;
}


sub merge_xml {
  my @repos = @_ # or qw/foo bar .../;

  foreach my $repository (@repos) {
    my @matched_files;
    find(
      match_factory( qr/\/(main|test)\.xml$/, \@matched_files ),
      $repository
    );
    print Dumper(\@matched_files);
  }

}

HTH

lordadmira
  • 1,807
  • 5
  • 14