2

I thought I knew a bit about Perl references and how to work with them. I cut my teeth on Perl 5.005. Right now I have a piece of code, fresh written in Perl 5.32, where I'm stumped by the behavior of some array reference operations.

Here is my minimal example:

#!/usr/bin/perl

my $array_ref = (); # create an anonymous array and keep a reference to it
my $another_ref = $array_ref; # assign the reference (not an array deep copy, or is it?)

print "Pushing foo and bar\n";
push @{ $array_ref }, "foo";
push @{ $array_ref }, "bar";

print "Array ref element count: " . $#{ $array_ref } . "\n";
print "Another ref element count: " . $#{ $another_ref } . "\n";

print "Pushing baz\n";
push @{ $array_ref }, "baz";

print "Array ref element count: " . $#{ $array_ref } . "\n";
print "Another ref element count: " . $#{ $another_ref } . "\n";

The resulting output:

# ./test.pl
Pushing foo and bar
Array ref element count: 1
Another ref element count: -1
Pushing baz
Array ref element count: 2
Another ref element count: -1

My production code creates an empty anonymous array = takes a reference to it, then stores a copy of the reference in some dynamic data structure, and then proceeds to add some elements to the anonymous array, using the original "my" (local) reference, before that "my" local label goes out of scope. Curiously to me, while the elements do get added to the originally obtained array reference, they do not appear via the "copied reference" = as if the line

my $another_ref = $array_ref;

behaved more like a copy-construction, and I ended up with a new, independent array. I.e., by the observed behavior, it appears to perform a deep copy. No syntax error gets reported by Perl.

It then occurred to me, to try the arrayref assignment after some elements were pushed unto the original array. A single line has moved in the source code:

#!/usr/bin/perl

my $array_ref = (); # create an anonymous array and keep a reference to it
#my $another_ref = $array_ref; # moved below:

print "Pushing foo and bar\n";
push @{ $array_ref }, "foo";
push @{ $array_ref }, "bar";
my $another_ref = $array_ref; # moved here

print "Array ref element count: " . $#{ $array_ref } . "\n";
print "Another ref element count: " . $#{ $another_ref } . "\n";

print "Pushing baz\n";
push @{ $array_ref }, "baz";

print "Array ref element count: " . $#{ $array_ref } . "\n";
print "Another ref element count: " . $#{ $another_ref } . "\n";

Resulting output:

# ./test.pl
Pushing foo and bar
Array ref element count: 1
Another ref element count: 1
Pushing baz
Array ref element count: 2
Another ref element count: 2

Now that does look like a proper "copy of the reference only".

So I get a deep copy on a reference to an empty array, but a shallow copy on a populated array? Ouch! Is this a feature by any chance? In what way?

This is making me wonder if I do mind. I do have the option to assign the array at the end of my "initial" scope, where the anonymous array gets instantiated. If by the end of that scope the array is still empty, further elements can indeed be added later on, but by then the original reference will have perished, and I'd be accessing the array via the "persistent" reference only anyway = with consistent results = no harm done, just something to be aware of, in my particular situation.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
frr
  • 396
  • 3
  • 9
  • 1
    I guess you have to use brackets to initialize the reference like `my $array_ref = [];`, not curly braces. – ernix Jul 25 '22 at 08:42
  • Thanks ernix, I can see that's the whole point. Spot on. BTW @zdim has turned it into an answer, detailing a number of other aspects. Thanks a bunch to both of you. I'm wondering how I've come up with parentheses. I did try to look up the correct syntax, and possibly followed the wrong advice :-) – frr Jul 25 '22 at 11:12
  • 1
    Re "*`(); # create an anonymous array`*", `( ... )` only changes changes precedence, like in math. It doesn't create a list or any kind of data structure. `()` is a little special. It does absolutely nothing in list context as per the above, but It needs to return *something* in scalar context, so it returns `undef`. – ikegami Jul 25 '22 at 13:34
  • @ikegami thanks for the comments, your explanations are razor sharp – frr Jul 25 '22 at 16:53
  • In addition, `$#{ $array_ref }` will not tell you the count of the array. It will tell you the last index of the array, which will be one fewer than the count. – tobyink Jul 26 '22 at 13:40

1 Answers1

5

That first line, that declares $array_ref, does not "create an anonymous array"

my $var = ();  # just an uninitialized scalar

merely declares a scalar variable, and assigns an empty list to it, to no effect at all. The variable stays uninitialized. One can see this by printing its ref, which shows an empty string (not ARRAY), or try to print it and you get a warning for using an uninitialized value (not ARRAY(0x...), a stringification for an array reference). Or use Devel::Peek.

So assigning that variable to another as it is declared again has no effect and $another_ref is just another uninitialized, completely unrelated, scalar.

But having declared a scalar, and having not initialized it, still allows us to turn in into a reference. So when you dereference it, to push values onto it, an anonymous array is indeed constructed (via autovivification), and after

push @$var, 1;  # now it "became" an array reference

that $var is an array reference. Well, it's a scalar which value is an array reference, so nothing very strange happened, and this is how it's often done: declare a scalar then later assign/construct a reference to/for it. However, it may surprise.

In the second attempt the $another_ref is indeed assigned an array reference since in the meanwhile $array_ref did get "elevated" into an array reference. (But that is not a "deep copy" and will be valid only for a reference to an array having no references for elements.)

To declare a scalar and make it into an array reference do

my $array_ref = [];

This is usually unneeded, since an uninitialized scalar intended for an array reference will be made into one once it is used that way.

An exception is if that scalar need be passed into a subroutine which need to be able to tell whether it got a reference or not; then we do need to first assign a reference to it.


If a variable has been defined, then this has an effect on it. For an array, @ary = (); clears all values (but keeps memory allocations, unlike undef @ary;); for a scalar it assigns an undef.

Except that perhaps it is strange that we may treat a mere undefined scalar as an array reference (by dereferencing it and pushing values onto it) and that it does become one right there

zdim
  • 64,580
  • 5
  • 52
  • 81
  • 1
    Thanks, my syntax error is clear... about two decades ago, I've peeked under the hood and I know a little bit of the guts. If I understand correctly, while scalars, arrays and hashes are distinct "data types", lists only exist as syntax-level constructs. Used as literals, rvalues. Actually I may have seen it used on the lhs too, when assigning multiple scalars in one go... So what I did was effectively assign an undef, or rather, achieve "no initialization", as if the equal mark and rhs were not there at all... – frr Jul 25 '22 at 11:38
  • 2
    Re "*I may have seen it used on the lhs too, when assigning multiple scalars in one go...*", Parens used on the LHS of `=` has an effect on which of the two `=` operators is used. See [Scalar vs List Assignment Operator](https://stackoverflow.com/a/54564429/589924) – ikegami Jul 25 '22 at 13:32
  • 1
    @frr That is all correct; a "list", as "made" by parenthesis (which don't actually make anything) in the programming text is used to group things for some purpose (assign to an array for instance), or as a precedence-deciding device. I like your name for it: a syntax construct. Internally it's a bunch of scalars on stack, to be dispensed with (and gone) for some purpose. – zdim Jul 25 '22 at 17:04
  • 1
    @frr To mention a distinct example (which you'll find in ikegami's link) of the LHS use: `my ($val) = needs-list-context (like regex);` So one still uses it on a single scalar, but then the `=` is a list-assignment so the operator on the RHS runs in "list context" what usually makes it return differently from what it does in scalar context (a regex match which captures being one example, `localtime` another, etc etc). – zdim Jul 25 '22 at 17:09
  • "_used to group things_" (in my comment above) -- "grouping" with parens in `@ary = (1,2,3);` being a precedence business, too: with parens the comma operator runs first and then the assignment, while without parens (`@ary = 1,2,3;`) the `=`operator goes first and `@ary` gets `1` while `2` and `3` are discarded (with warnings) – zdim Aug 06 '22 at 06:27