3

I'm trying to expand my usage of implicit $_ (the global "topic" variable) in my code. Perlmonks has this (outdated?) article on functions which accept $_ in absence of explicit variables.

The problem I'm having is that I don't know which functions set $_. I know that at least map, grep, and for/foreach will alter the value of $_, but I assume there must be more. I am also unclear on any scope issues relating to $_, as in:

for (@array_of_array_refs)
{
  for (@$_)
  {
    print;
  }
  print;  # what does this print?
}

Is there a list of functions, or a set of guidelines to follow, so I will know intuitively how to avoid clobbering $_?

Greg Kennedy
  • 430
  • 4
  • 23
  • 3
    If you are unsure then your future maintenance programmer might be unsure. So do it explicitly not implicitly. – Sobrique Dec 03 '15 at 09:32
  • 1
    You can do what I did when I added my part to that Perlmonk's post: read through perlfunc and see for yourself. It's something you should do anyway. We also cover this in [Learning Perl](http://www.learning-perl.com). – brian d foy Dec 03 '15 at 16:39
  • 2
    I do not think it is such a great idea to wholesale try to expand the use of `$_` throughout your code. – Sinan Ünür Dec 04 '15 at 15:55
  • 2
    See, I've come to regard $_ as one of the language features that makes Perl uniquely Perl. If I avoid all the Perlisms, I may as well just use Python or something. – Greg Kennedy Dec 05 '15 at 04:11
  • 2
    `$_` in the right situation can make a statement clearer and less redundant. For example, one of my favorite uses is running a function on each element of an array `function($_) for @values;`. The use of `$_` in a double loop, however, immediately reminds seasoned Perl programmers of debugging nightmares. – Christopher Bottoms Dec 05 '15 at 06:02
  • To answer the question in your code " what does this print?": It will print a string representing the current array reference in `@array_of_array_refs` (something like `ARRAY(0x183ea68)`). – Christopher Bottoms Dec 05 '15 at 06:16

3 Answers3

9

Steffen Ullrich's answer is misleading so I'll have to respond here. I might have missed a few things, but it's late. And, Learning Perl already explains it all. ;)

The local operator does not work in lexical scope. It's not limited to the block it's in, despite what he says. People typically have this problem in understanding because they don't try it. Terms like "outside" and "inside" are misleading and dangerous for local.

Consider this use, where there's a function that prints the global value of $_:

$_ = 'Outside';
show_it();
inside();
$_ = 'Outside';
show_it();

sub show_it { print "\$_ is $_\n"; }

sub inside {
    local $_;

    $_ = 'Inside';
    show_it();
    }

When you run this, you see that the value of $_ set inside a block is available outside the block:

$_ is Outside
$_ is Inside
$_ is Outside

The local works on package variables. It temporarily uses a new value until the end of the block. As a package variable, though, it's changed everywhere in the program until local's scope ends. The operator has a lexical scope, but its effect is everywhere. You are giving a global variable a temporary value, and that global variable is still global. local variable have global effect but lexical lifetime. They change the value for everywhere in the program until that scope ends.

As I wrote before, it's wrong to talk about "inside" and "outside" with local. It's "before" and "after". I'll show a bit more of that coming up, where even time disintegrates.

The my is completely different. It does not work with package variables at all. Also called "lexical variables", these don't exist at all outside their scope (even though back magic modules such as PadWalker look at them). There's no way for any other part of the program to see them. They are only visible to in their scope and sub-scopes created in that scope.

Perl v5.10 allows us to create a lexical version of $_ (and fixed and made experimental in v5.16—don't use it. See also The good, the bad, and the ugly of lexical $_ in Perl 5.10+). I can make my previous example use that:

use v5.10;

$_ = 'Outside';
show_it();
inside();
$_ = 'Outside';
show_it();

sub show_it { print "\$_ is $_\n"; }


sub inside {
    my $_;
    $_ = 'Inside';
    show_it();
    }

Now the output is different. The lexical $_ has the same effect as any other lexical variable. It does not effect anything outside its scope, again, because these variables only exist in their lexical scope:

$_ is Outside
$_ is Outside
$_ is Outside

But, to answer the original question. The Perlmonks post Builtin functions defaulting to $_ is still good, but I don't think it's relevant here. Those function use $_, not set it.

The big thing to know about Perl is that there is no short answer. Perl does the thing that makes sense, not the thing that makes it consistent. It is, after all, a post-modern language.

The way to not worry about changing $_ is not change $_. Avoid using it. We have lots of similar advice in Effective Perl Programming.

foreach

The looping constructs foreach and its for synonym use a localized version of $_ to refer to the current topic. Inside the loop, including anything that loop calls, uses the current topic:

use v5.10;

$_ = 'Outside';
show_it();
sub show_it { say "\$_ is $_"; }

my @array = 'a' .. 'c';
foreach ( @array ) {
    show_it();
    $_++
    }

say "array: @array";

Notice array after the foreach loop. Even though foreach localizes the $_, Perl aliases the value rather than copying it. Changing the control variable changes the original value even if that value is in an outer lexical scope:

$_ is Outside
$_ is a
$_ is b
$_ is c
array: b c d

Don't use $_ as the control variable. I only use the default in really short programs, mostly because I want the control variable to have a meaningful name in big programs.

map and grep

Like foreach, map and grep use $_ for the control variable. You can't use a different variable for these. You can still affect variables outside the scope through that performance-enhancing aliasing I showed in the previous section.

Again, this means that's there some scope leak. If you change the $_ inside the block and $_ was one of the items in the input list, the outer $_ changes:

use v5.10;
$_ = 'Outside';
my @transformed = map { $_ = 'From map' } ( $_ );
say $_;

For moderately complicated inline blocks, I assign $_ to a lexical variable:

my @output = map { my $s = $_; ... } @input;

And if you are really nervous about $_, don't do the evil trick of a map inside a map:

my @words = map {
    map { split } $_
    } <>;

That's a dumb example, but I've done such things in the past where I needed to turn the topic into a list.

while( <> )

Perl has a handy little idiom that assigns the next line from a filehandle to $_. This means that instead of this:

while( defined( $_ = <> ) )

You can get the exact same thing with:

while( <> )

But, whatever value ends up in $_ stays in $_.

$_ = "Outside\n";
show_it();
sub show_it { print "\$_ is $_" }

while( <DATA> ) {
    show_it();
    }

show_it();

__DATA__
first line
second line
third line

The output looks a little weird because the last line has no value, but that's the last value assigned to $_: the undef that the line input operator assigned before the defined test stopped the loop:

$_ is Outside
$_ is first line
$_ is second line
$_ is third line
$_ is

Put a last in there and the output will change

$_ = "Outside\n";
show_it();
sub show_it { print "\$_ is $_" }

while( <DATA> ) {
    show_it();
    last;
    }

show_it();

__DATA__
first line
second line
third line

Now the last value assigned was the first line:

$_ is Outside
$_ is first line
$_ is first line

If you don't like this, don't use the idiom:

while( defined( my $line = <> ) )

Pattern matching

The substitution operator, s///, binds to $_ by default and can change it (that's sorta the point). But, with v5.14, you can use the /r flag, which leaves the original alone and returns the modified version.

The match operator, m//, can also change $_. It doesn't change the value, but it can set the position flag. That's how Perl can do global matches in scalar context:

use v5.10;

$_ = 'Outside';
show_it();
sub show_it { say "\$_ is $_ with pos ", pos(); }

foreach my $time ( 1 .. 5 ) {
    my $scalar = m/./g;
    show_it();
    }

show_it();

Some of the scalar settings in $_ change even though the value is the same:

$_ is Outside with pos
$_ is Outside with pos 1
$_ is Outside with pos 2
$_ is Outside with pos 3
$_ is Outside with pos 4
$_ is Outside with pos 5
$_ is Outside with pos 5

You probably aren't going to have a problem with this. You can reset the position with an unsuccessful match against $_. That is, unless you're using the /c flag. Even though the scalar value didn't change, part of its bookkeeping changed. This was one of the problems with lexical $_.

There's another curious thing that happens with matching. The per-match variables are dynamically scoped. They don't change the values they had in the outer scope:

use v5.10;

my $string = 'The quick brown fox';

OUTER: {
    $string =~ /\A(\w+)/;
    say  "\$1 is $1";

    INNER: {
        $string =~ /(\w{5})/;
        say  "\$1 is $1";
        }

    say  "\$1 is $1";
    }

The value of $1 in the OUTER scope isn't replaced by the $1 in INNER:

$1 is The
$1 is quick
$1 is The

If that hurts your head, don't use the per-match variables. Assign them right away (and only when you've had a successful match):

my $string = 'The quick brown fox';

OUTER: {
    my( @captures ) = $string =~ /\A(\w)/;

    INNER: {
        my $second_word;
        if( $string =~ /(\w{5})/ ) {
            $second_word = $1
            }
        }
    }
Community
  • 1
  • 1
brian d foy
  • 129,424
  • 31
  • 207
  • 592
5

Please refer to Brian's answer for a much more detailed explanation. But I leave this answer because some of the issues in the context of the question can be complex to understand and the different description in this answer and the comments may be helpful in addition to Brian's answer to understand the problem better.

It might also be useful to read the Wikipedia page for "scope" to understand the various kinds of scopes, especially lexical and dynamic scope.


map, grep, for/foreach etc "localize" $_. This means that they bind a new variable to $_ and the original variable gets bound to $_ only when leaving the lexical scope. See at the end of the answer for a more detailed description of this "localizing". For example:

for(qw(1 2)) {
    for(qw(a b)) {
        print map { uc($_) } ($_,'x');
        print $_
    }
    print $_
}

will give you AXaBXb1AXaBXb2 which shows that each use of for/map binds $_ to a different variable and binds it back to the previous variable after leaving the block.

And for the function which take $_ as the default argument: these don't have any side effects either apart from the expected (i.e. substitute s///) and it is documented in perldoc when the function or operation will use $_ as default argument.

However you have to watch out if you use $_ yourself and want to make sure that it does not affect the previous meaning. In this case localizing $_ yourself helps against accidentally changing the previous $_:

sub myfunction {
    local $_;
    # from now on until the functions gets left changes to $_ 
    # will not affect the previous $_
    ...
}

This is also possible with a block

{
    local $_;
    # from now on until the block gets left changes to $_
    # will not affect the previous $_
    ...
}

But note that the often used while (<>) will not localize $_:

$_ = 'foo';
say $_;
while (<>) {
    say $_;
}
say $_;

In this case the say $_ after the loop will not show the value from before the loop ('foo') but the last implicit assignment from the loop (undef).


What exactly is localizing? Most are used to lexical scoping which can be done with "my" in Perl. But "localizing" a variable is different, no matter if it is done with an explicit local or implicit inside for, map...

The main idea is that by localizing a global symbol like $_ gets bound to a different variable and the original variable is only restored after the end of the lexical scope. Thus contrary to lexical scoping this new binding affects even functions called from inside this lexical scope, i.e.

sub foo { say $_}

$_ = 1;
foo(); # 1

{
    local $_;  # bind symbol $_ to new variable
    $_ = 2;
    foo();     # 2 - because $_ inside foo() is the same as in this block
}

foo(); # 1     # leaving the block restored original binding of $_
Community
  • 1
  • 1
Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172
  • 2
    While() and s/// as well. Also it's not recommended to alter $_ inside grep and map as it is aliased to list members like in case with foreach. – mpapec Dec 03 '15 at 05:34
  • 1
    Most of this post is incorrect. The `local` forms a "dynamic scope" that changes the value of its variable in time, not lexical scope. Until the block that declares the local variable is done, the value of your $_ is changed everywhere in the program. You misunderstand `local` and should delete this post. – brian d foy Dec 03 '15 at 16:43
  • 1
    @briandfoy: I appreciate your comment but I did talk about scope and not lexical scope. I think I'm aware how local works and maybe my description was too much trivialized. Why not just make your own answer and describe it better so that we could all learn from it? – Steffen Ullrich Dec 03 '15 at 20:14
  • @briandfoy: Thanks for your great and extensive answer to this question. I see as the essential part of your complaint against my answer a different understanding of inside, outside and scope. I've clarified in my answer that I don't mean the lexical scope but I mean what you call "dynamic scope". I.e. "inside" is for me after entering the blocking (scope) and "outside" after leaving the block (scope) - in time and not in code. But you are right that it is hard to understand for anyone using only languages which don't have this concept and that the words are easy to misinterpret. – Steffen Ullrich Dec 04 '15 at 16:54
  • As I showed though, inside and outside don't matter because there is scope leak. You can't say that local won't affect outside its scope in either space or time because it can. – brian d foy Dec 04 '15 at 19:43
  • @briandfoy: Thanks for your input. I've now removed any traces of inner or outer scope from the answer because these phrases are obviously confusing. I still leave the (edited) answer because it explains the things in other words then you so it might be helpful additionally to your answer to understand the complexities surrounding local. – Steffen Ullrich Dec 04 '15 at 22:14
0

Besides learning all of the built-in functions (which you should do at least for the ones you use), here is what I think is the best guideline for working with $_:

  • Only use $_ when it is clearer and more obvious than using explicit variables.

Thank of $_ as "it". "It" is a pronoun. You don't say "I went to it and bought it" when you mean "I went to the store and bought ice cream". Pronouns are to be used only when it is obvious what they are referring to.

Mutating your original code a bit gives us a useful example. Notice the use of a named (explicit) variable in the outer scope and the use of $_ in the smallest scope possible (where it reduces the entire inner loop into a single line):

# Process student records one year at a time
for my $student_records_aref (@student_records_by_year_array_refs)
{
    # Print a report on each student for the current year
    print_report_on($_) for @{$student_records_aref};
}
Christopher Bottoms
  • 11,218
  • 8
  • 50
  • 99