2

I wanted to remove duplicate values from an array with this approach. The duplicates removal have to be executed inside a loop. Here is a minimal example that demonstrates the problem that I encountered:

use strict;

for (0..1){
    my %seen;
    sub testUnique{
        return !$seen{shift}++;
    }

    my $res = testUnique(1);

    if($res){
        print "unique\n";
    }else{
        print "non-unique\n";
    }
}

I define the %seen hash inside a loop, so I would expect it to be defined only during a single iteration of the loop. The result of the above code is however:

unique
non-unique

With some debug prints, I found out that the value of %seen is preserved from one iteration to another.

I tried a trivial

for (0..1){
    my %seen;
    $seen{1}++;
    print "$seen{1}\n";
}

And this one worked as expected. It printed:

1
1

So, I guess the problem is with inner function testUnique. Can somebody explain me what is going on here?

Community
  • 1
  • 1
jutky
  • 3,895
  • 6
  • 31
  • 45

3 Answers3

4

Your testUnique sub closes over the first instance of %seen. Even though it is inside the for loop, the subroutine does not get compiled repeatedly.

Your code is compiled once, including the part that says initialize a lexically scoped variable %hash right at the top of the for loop.

The following will produce the output you want, but I am not sure I see are going down this path:

#!/usr/bin/env perl

use warnings;
use strict;

for (0..1){
    my %seen;
    my $tester = sub {
        return !$seen{shift}++;
    };

    print $tester->(1) ? "unique\n" : "not unique\n";
}
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • So the variables are redefined repeatedly and subroutines are not compiled repeatedly. This is not a coherent behavior, but surely it explains the phenomena. Thanks. – jutky Feb 16 '15 at 21:23
  • It is very coherent behavior when you realize `perl` goes through [a compile stage, and a run time stage](http://modernperlbooks.com/mt/2009/08/how-a-perl-5-program-works.html). `%seen` is a lexical variable. Also, variables are not *re-defined* repeatedly. Everything is compiled once. – Sinan Ünür Feb 16 '15 at 21:26
  • Thanks a lot for posting the correct solution and not only explaining the problem. – jutky Feb 16 '15 at 21:33
  • 1
    @jutky: The distinction is not between definitions, but *declarations*. `%seen` is local to the enclosing block, but `testUnique` (which should properly be called `test_unique`) is global, in common with most languages, and it is misleading to define it within a subsidiary block. Sinan has shown how to redefine a subroutine within a block if that is what you want to do. – Borodin Feb 16 '15 at 21:44
  • @Borodin but I can't call `testUnique` outside of the loop, so it is not quite global. (I came from Java, hence the naming convention ;) ) – jutky Feb 16 '15 at 21:49
  • @jutky: Yes, you can. Try it. And Perl is no more Java than it is Lua. – Borodin Feb 16 '15 at 21:50
  • @jutky I can confirm what Borodin is saying. Calling it from anywhere in the file works fine. If you're not getting the output you expect, it may be because of some typo near the call. – AKHolland Feb 16 '15 at 22:15
  • @AKHolland Hm, in my example, if I call to `testUnique` after the loop, I always get false. But when I change it's body to `my $x = shift; return !$seen{$x}++;` it works correctly. – jutky Feb 16 '15 at 22:35
3

The subroutine can only be defined once, and isn't re-created for each iteration of the loop. As a result, it only holds a reference to the initial %seen hash. Adding some output helps to clarify this:

use strict;
use warnings;

for(0 .. 1) {
    my %seen = ();
    print "Just created " . \%seen . "\n";

    sub testUnique {
        print "Testing " . \%seen . "\n";
        return ! $seen{shift} ++;
    }

    if(testUnique(1)) {
        print "unique\n";
    }
    else {
        print "non-unique\n";
    }
}

Output:

Just created HASH(0x994fc18)
Testing HASH(0x994fc18)
unique
Just created HASH(0x993f048)
Testing HASH(0x994fc18)
non-unique

Here it can be seen that the initial hash is the only one tested.

AKHolland
  • 4,435
  • 23
  • 35
  • 2
    This answer is best. It demonstrates the closure created by the `for` loop nicely. – David K-J Feb 16 '15 at 21:31
  • ++ For further homework this can be compared with`my $test_unique = sub { print "Testing " . \%seen . "\n"; return ! $seen{shift()} ++;}`; and `if ( $test_unique->($num) ) {say "you neat"}` *etc.* to further explain difference with anonymous subroutines. – G. Cito Apr 21 '15 at 14:01
  • On `CODE` refs see: [Why would I use Perl anonymous subroutines instead of a named one?](http://stackoverflow.com/q/834590/2019415) – G. Cito Apr 21 '15 at 14:01
2

Welcome to the world of closures.

sub make_closure {
    my $counter = 0;
    return sub { return ++$counter };
}

my $counter1 = make_closure();
my $counter2 = make_closure();

say $counter1->();  # 1
say $counter1->();  # 2
say $counter1->();  # 3
say $counter2->();  # 1
say $counter2->();  # 2
say $counter1->();  # 4

sub { } captures lexical variables that are in scope, giving the sub access to them even when the scope in which they exist is gone.

You use this ability every day without knowing it.

 my $foo = ...;
 sub print_foo { print "$foo\n"; }

If subs didn't capture, the above wouldn't work in a module since the file's lexical scope is normally exited before any of the functions in the module are called.

Not only does print_foo need to capture $foo for the above to work, it must do so when it's compiled.

sub testUnique {
    return !$seen{shift}++;
}

is basically the same thing

BEGIN {
    *testUnique = sub {
        return !$seen{shift}++;
    };
}

which means that sub { } is executed at compile time, which means it captures %seen that existed at compile time, meaning before the loop has even started.

The first pass of the loop will use that same %seen, but a new %seen will be created for each subsequent pass to allow things like

my @outer;
for (...) {
   my @inner = ...;
   push @outer, \@inner;
}

If you executed the sub { } at run-time, there'd be no problem.

for (0..1){
    my %seen;
    local *testUnique = sub {
        return !$seen{shift}++;
    };

    my $res = testUnique(1);

    if($res){
        print "unique\n";
    }else{
        print "non-unique\n";
    }
}
ikegami
  • 367,544
  • 15
  • 269
  • 518