111

Which is the best implementation(in terms of speed and memory usage) for iterating through a Perl array? Is there any better way? (@Array need not be retained).

Implementation 1

foreach (@Array)
{
      SubRoutine($_);
}

Implementation 2

while($Element=shift(@Array))
{
      SubRoutine($Element);
}

Implementation 3

while(scalar(@Array) !=0)
{
      $Element=shift(@Array);
      SubRoutine($Element);
}

Implementation 4

for my $i (0 .. $#Array)
{
      SubRoutine($Array[$i]);
}

Implementation 5

map { SubRoutine($_) } @Array ;
Jean
  • 21,665
  • 24
  • 69
  • 119
  • 2
    Why would there be a "best"? Especially given that we have no idea how you would measure one against another (is speed more important than memory use? is `map` and acceptable answer?. etc.) – Max Lybbert May 07 '12 at 18:50
  • 2
    Two of the three you posted would make me go "WTH?!" unless there as additional surrounding context to make them sensible alternatives. In any case, this question is at the level of "*What's the best way to add two numbers?*" Most of the time, there is only one way. Then, there are those circumstances, where you need a different way. Voting to close. – Sinan Ünür May 07 '12 at 19:21
  • 4
    @SinanÜnür I empathize with your opinion (that there is only one way to add two numbers), **but** the analogy is not strong enough to use dismissively. Obviously, there is more than one way, and the OP wants to understand what's a good idea and what isn't. – CodeClown42 May 07 '12 at 19:39
  • 2
    Chapter 24 of the third edition of Programming Perl has a section on efficiency that is a good read. It address the different types of efficiency such as time, programmer, maintainer. The section starts off with the statement "Note that optimizing for time may sometimes cost you in space or programmer efficiency (indicated by conflicting hints below). Them's the breaks." –  May 07 '12 at 21:14
  • 1
    One 1 way to add two numbers? Not if you look into lower level calls / implementations.... think carry lookahead, carry save adders etc. – workwise Sep 30 '13 at 16:10
  • One thing unstated so far. 99% of the time you want to code for *clarity* and *understandability* by the next guy who looks at it, not speed or cleverness. If you make it clever, make sure to explain it well in the comments. So `foreach my $thing (@array) { ...; }` is the way to go. – lordadmira Nov 17 '21 at 03:10

6 Answers6

92
  • In terms of speed: #1 and #4, but not by much in most instances.

    You could write a benchmark to confirm, but I suspect you'll find #1 and #4 to be slightly faster because the iteration work is done in C instead of Perl, and no needless copying of the array elements occurs. ($_ is aliased to the element in #1, but #2 and #3 actually copy the scalars from the array.)

    #5 might be similar.

  • In terms memory usage: They're all the same except for #5.

    for (@a) is special-cased to avoid flattening the array. The loop iterates over the indexes of the array.

  • In terms of readability: #1.

  • In terms of flexibility: #1/#4 and #5.

    #2 does not support elements that are false. #2 and #3 are destructive.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 10
    Wow, you added truck loads of information in short and simple sentences. – jaypal singh Aug 17 '14 at 19:50
  • 2
    #2 is good when you do queues (e.g. breadth-first searches): `my @todo = $root; while (@todo) { my $node = shift; ...; push @todo, ...; ...; }` – ikegami Feb 02 '15 at 14:40
  • Doesn't implementation 4 create an intermediate array of indices, which might introduce a large amount of memory to be used? If so, sounds like one shouldn't use that approach. https://stackoverflow.com/questions/6440723/perl-array-of-integers-using-way-too-much-memory https://rt.cpan.org/Public/Bug/Display.html?id=115863 – Thorsten Schöning Jan 14 '19 at 11:40
  • @ikegami True to your champion style - great answer :) – skeetastax Aug 22 '20 at 01:37
33

If you only care about the elements of @Array, use:

for my $el (@Array) {
# ...
}

or

If the indices matter, use:

for my $i (0 .. $#Array) {
# ...
}

Or, as of perl 5.12.1, you can use:

while (my ($i, $el) = each @Array) {
# ...
}

If you need both the element and its index in the body of the loop, I would expect using each to be the fastest, but then you'll be giving up compatibility with pre-5.12.1 perls.

Some other pattern than these might be appropriate under certain circumstances.

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • I would expect the `each` to be the slowest. It does all the work of the others minus an alias, plus a list assignment, two scalar copies and two scalar clearings. – ikegami May 08 '12 at 02:44
  • 1
    And, to the best of my measurement ability, you are right. About 45% faster with `for` iterating over indices of an array, and 20% faster when iterating over the indices of an array reference (I do access `$array->[$i]` in the body), over using `each` in conjunction with `while`. – Sinan Ünür May 08 '12 at 03:48
4

IMO, implementation #1 is typical and being short and idiomatic for Perl trumps the others for that alone. A benchmark of the three choices might offer you insight into speed, at least.

JRFerguson
  • 7,426
  • 2
  • 32
  • 36
4

The best way to decide questions like this to benchmark them:

use strict;
use warnings;
use Benchmark qw(:all);

our @input_array = (0..1000);

my $a = sub {
    my @array = @{[ @input_array ]};
    my $index = 0;
    foreach my $element (@array) {
       die unless $index == $element;
       $index++;
    }
};

my $b = sub {
    my @array = @{[ @input_array ]};
    my $index = 0;
    while (defined(my $element = shift @array)) {
       die unless $index == $element;
       $index++;
    }
};

my $c = sub {
    my @array = @{[ @input_array ]};
    my $index = 0;
    while (scalar(@array) !=0) {
       my $element = shift(@array);
       die unless $index == $element;
       $index++;
    }
};

my $d = sub {
    my @array = @{[ @input_array ]};
    foreach my $index (0.. $#array) {
       my $element = $array[$index];
       die unless $index == $element;
    }
};

my $e = sub {
    my @array = @{[ @input_array ]};
    for (my $index = 0; $index <= $#array; $index++) {
       my $element = $array[$index];
       die unless $index == $element;
    }
};

my $f = sub {
    my @array = @{[ @input_array ]};
    while (my ($index, $element) = each @array) {
       die unless $index == $element;
    }
};

my $count;
timethese($count, {
   '1' => $a,
   '2' => $b,
   '3' => $c,
   '4' => $d,
   '5' => $e,
   '6' => $f,
});

And running this on perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-gnu-thread-multi

I get:

Benchmark: running 1, 2, 3, 4, 5, 6 for at least 3 CPU seconds...
         1:  3 wallclock secs ( 3.16 usr +  0.00 sys =  3.16 CPU) @ 12560.13/s (n=39690)
         2:  3 wallclock secs ( 3.18 usr +  0.00 sys =  3.18 CPU) @ 7828.30/s (n=24894)
         3:  3 wallclock secs ( 3.23 usr +  0.00 sys =  3.23 CPU) @ 6763.47/s (n=21846)
         4:  4 wallclock secs ( 3.15 usr +  0.00 sys =  3.15 CPU) @ 9596.83/s (n=30230)
         5:  4 wallclock secs ( 3.20 usr +  0.00 sys =  3.20 CPU) @ 6826.88/s (n=21846)
         6:  3 wallclock secs ( 3.12 usr +  0.00 sys =  3.12 CPU) @ 5653.53/s (n=17639)

So the 'foreach (@Array)' is about twice as fast as the others. All the others are very similar.

@ikegami also points out that there are quite a few differences in these implimentations other than speed.

G. Allen Morris III
  • 1,012
  • 18
  • 30
  • 1
    The comparison `$index < $#array` should actually be `$index <= $#array` because `$#array` is not the length of the array but the last index of it. – josch Dec 05 '19 at 06:01
3

In single line to print the element or array.

print $_ for (@array);

NOTE: remember that $_ is internally referring to the element of @array in loop. Any changes made in $_ will reflect in @array; ex.

my @array = qw( 1 2 3 );
for (@array) {
        $_ = $_ *2 ;
}
print "@array";

output: 2 4 6

Sandeep_black
  • 1,352
  • 17
  • 18
2

1 is substantially different from 2 and 3, since it leaves the array in tact, whereas the other two leave it empty.

I'd say #3 is pretty wacky and probably less efficient, so forget that.

Which leaves you with #1 and #2, and they do not do the same thing, so one cannot be "better" than the other. If the array is large and you don't need to keep it, generally scope will deal with it (but see NOTE), so generally, #1 is still the clearest and simplest method. Shifting each element off will not speed anything up. Even if there is a need to free the array from the reference, I'd just go:

undef @Array;

when done.

  • NOTE: The subroutine containing the scope of the array actually keeps the array and re-uses the space next time. Generally, that should be fine (see comments).
CodeClown42
  • 11,194
  • 1
  • 32
  • 67
  • `@Array = ();` does not free the underlying array. Not even going out of scope would do that. If you wanted to free the underlying array, you'd have use `undef @Array;`. – ikegami May 07 '12 at 20:04
  • 2
    Demo; `perl -MDevel::Peek -e'my @a; Dump(\@a,1); @a=qw( a b c ); Dump(\@a,1); @a=(); Dump(\@a,1); undef @a; Dump(\@a,1);' 2>&1 | grep ARRAY` – ikegami May 07 '12 at 20:06
  • **WHAT???** I had thought the whole point of GC was once a ref count == 0, the memory involved becomes recyclable. – CodeClown42 May 07 '12 at 20:07
  • @ikegami: I see the thing about `()` vs `undef`, but if going out of scope does not release the memory used by an array local to that scope, doesn't that make perl a leaking disaster? That *can't* be true. – CodeClown42 May 07 '12 at 20:11
  • They don't leak either. The sub still owns them, and will reuse them the next time the sub is called. Optimised for speed. – ikegami May 07 '12 at 20:26
  • Ah. Well, that's good to know :). I'll add a paragraph here. – CodeClown42 May 07 '12 at 20:28
  • @goldilocks: think about this as about any cache, space for time trade ... but it would be nice if this behaviour could be controlled at least at build time or even better at compile/run-time, for example something like `use feature 'active_gc'` – ArtMat May 07 '12 at 22:40
  • @ArtM, Completely misnamed. GC is always active in Perl, and everything is collected the instant it is no longer referenced. – ikegami May 08 '12 at 02:42
  • @ikegami: yes, probably misnamed, but I would say the way GC is working now is a *lazy* one, like me, preferring not to put a book back on the shelf, but keeping it on the table because there are chances I'll need it soon, and it doesn't matter if the table looks like a dump :) (that's why I named it *active* as a counterpart of *lazy*) ... I hope the main idea is clear and it would be nice to have the chance to have a *clean table* – ArtMat May 08 '12 at 10:20
  • @ArtM: I disagree. It makes sense to keep the memory from a previous sub invocation as the cache for the next one, because *generally* sub invocations do pretty similar things each time, and the blocks required will be similar (hence, "optimized for speed", which perl is not "optimized for mem usage", lol, but the former is *generally* a much better trade-off than the later). In cases where you see a potential exception to the rule, use `undef`. But I would never set that as a default for the entire process. – CodeClown42 May 08 '12 at 10:30
  • @goldilocks: I didn't say it's better or it should be a default, just to have the possibility to control this aspect if someone requires it – ArtMat May 08 '12 at 10:45
  • Is there a distinction(in terms of performance) when using foreach and while ? – Jean May 08 '12 at 20:25
  • @alertjean, They don't do the same thing, so how can you measure difference in performance of foreach and while in general? The difference in performance of your specific foreach and while loops is explained in my answer. – ikegami May 08 '12 at 20:44
  • @ArtM: I know you didn't say it should be a default...by "I would never set that *as a default*" I meant that it shouldn't be a global option (via `use some feature`) and that *no one should ever set that* globally. There should not be an option to help you down that path. Analogy: maybe a `use feature: make WORST choice by default` should be implemented ;). Deal with potential exceptions, but don't make exceptions the rule. – CodeClown42 May 09 '12 at 01:33