13

Is this the fastest (execution time) way to find the longest element in a list?

#!/usr/bin/env perl
use warnings;
use 5.012;
use List::Util qw(reduce);
use List::Util::XS;

my @array = qw( one two three four five six seven eight nine ten eleven );

my $l = reduce{ length($a) > length($b) ? $a : $b } @array;

say $l;
sid_com
  • 24,137
  • 26
  • 96
  • 187
  • 3
    It's unnecessary to `use List::Util::XS` unless your program has to die unless the XS version is installed. `use List::Util` automatically loads the XS version if available. – cjm Nov 15 '10 at 07:17
  • 4
    By "fastest", are you referring to execution time or coding time? – outis Nov 15 '10 at 07:47
  • 6
    Are you trying to optimise your application? Did a profiler actually tell you this one line in your application was taking up a significant amount of the processing time? – rafl Nov 15 '10 at 08:39
  • @ cjm: Added "use List::Util::XS" to prevent the question: depends if List::Util::XS is available or not. – sid_com Nov 15 '10 at 09:19
  • 1
    @rafl: I didn't use a profiler. But it's no harm to know which way is the fastest. – sid_com Nov 15 '10 at 09:22
  • 3
    At least you should be using > instead of gt, or you'll get incorrect results. – gpvos Nov 15 '10 at 09:40
  • 6
    Sounds like premature optimization to me. How big is your list? What’s the real application like? – tchrist Nov 15 '10 at 10:02
  • If you keep the array sorted it's very fast ;) – Øyvind Skaar Nov 15 '10 at 11:43
  • It's for a terminal-user-interface. I am trying to adapt Term::Clui to my preferences. List-length - lets say less then 10_000 - my cpan_module.pl shows me 5_001 entries when I search for Perl (I know it makes not much sense searching for Perl); but it uses Term::ReadLine and Term::Pager which I think its good for very long lists; for long lists I prefer Term::UI so lets not say less then 10_000 but lets say less then 150. – sid_com Nov 15 '10 at 15:11
  • Now after a time of reflection I feel a little guilty about micro-optimization. But I thought it's an user-interface and it doesn't cost money. I didn't think there would be a sensible slowdown but I thought this time a little here and next time a little there and another next time maybe a new loop around and so on. – sid_com Nov 15 '10 at 15:15
  • I tried to benchmark four different routines, but the benchmark didn't stop anymore. I use the Benchmark-module and as count-argument I used 0. Could someone tell me how long such a benchmark runs normally? – sid_com Nov 15 '10 at 15:16
  • The answer is eleven, you don't need a computer to work this out – Tom Gullen Nov 15 '10 at 16:52

10 Answers10

18

When only trying to find one element of a list, there is no need to construct an N sized data structure as many answers here have done. The fastest O(N) way to do this is to walk the array, keeping track of the largest element. That way you have O(N) accesses of the list, and O(1) memory usage.

sub longest {
    my $max = -1;
    my $max_i = 0;
    for (0 .. $#_) {              # for each index
        my $len = length $_[$_];  # only get length once per item
        if ($len > $max) {        # save index and update max if larger
            $max = $len;
            $max_i = $_;
        }
    }
    $_[$max_i]   # return the largest item
}

If you are going to be running the above code many times, I would suggest inlining the body of the subroutine.

EDIT:

drewk's benchmark revealed that the array index in the above code is a bit of a bottleneck. Experimenting a little more, I have finally found a method that is faster than the reduce solution:

sub fastest {
    my $max = -1;
    my $max_ref;
    for (@_) {
        if (length > $max) {  # no temp variable, length() twice is faster
            $max = length;
            $max_ref = \$_;   # avoid any copying
        }
    }
    $$max_ref
}

which results in the following benchmark:

           Rate longest   drewk  reduce fastest
longest 44245/s      --    -21%    -30%    -47%
drewk   55854/s     26%      --    -11%    -33%
reduce  63014/s     42%     13%      --    -25%
fastest 83638/s     89%     50%     33%      --
Eric Strom
  • 39,821
  • 2
  • 80
  • 152
  • I think that the second call to `length` with the same value of the implicit `$_` is probably cached from the first call in your `fastest`, no? – dawg Nov 15 '10 at 20:00
  • @drewk => I am not exactly sure how the optimization is playing out. From ysth's comments, it seems like Perl maintains a field in the SV that stores the length. So my guess is that even out of order lookups will all be constant time once the value has been calculated once. In general, its always good to benchmark to see if repeated calls to a builtin are faster than making a lexical copy. I have tested `caller`, `wantarray`, `ref` and and others. Most of the time it is much faster to just call them repeatedly. You need 30+ calls before the lexical will give you any advantage. – Eric Strom Nov 15 '10 at 20:49
  • Sometimes people call this the "high water mark" approach. It's what we do in _Learning Perl_. – brian d foy Jun 13 '11 at 16:24
6

Here is slightly modified version of OMG_peanuts with for and less variables:

my $len = length $array[0];
my $longest = 0;
for my $i (1 .. $#array) {
    my $i_len = length $array[$i];
    if($i_len > $len) {
        $longest = $i;
        $len = $i_len;
    }
}
my $l = $array[$longest];

I was playing a bit with benchmarks, getting this for small numbers (original array)

           Rate REDUCE TMPVAR TMPFOR
REDUCE 234862/s     --    -0%    -7%
TMPVAR 235643/s     0%     --    -6%
TMPFOR 251326/s     7%     7%     --

and this for larger number or items (original array x 100)

         Rate TMPVAR TMPFOR REDUCE
TMPVAR 3242/s     --   -28%   -32%
TMPFOR 4503/s    39%     --    -5%
REDUCE 4750/s    47%     5%     --

Note that suitability of algorithm heavily varies due to data specifics (I would guess longer strings may increase weight of length function in algorithm).

EDIT: Here is full code for the benchmark (long array version, short is missing x 100 in array definition)

use Benchmark  qw(:all);
use List::Util qw(reduce);

my @array = qw( one two three four five six seven eight nine ten eleven ) x 100;

cmpthese(-2, {
    REDUCE => sub {
        my $l = reduce{ length($a) gt length($b) ? $a : $b } @array;
    },
    TMPVAR => sub {
        my $idx = 1;
        my $lastLength = length $array[0];
        my $lastElt = $array[0];
        my $listLength = scalar @array;
        while ($idx < $listLength) {
          my $tmpLength = length $array[$idx];
          if ($tmpLength > $lastLength) {
            $lastElt = $array[$idx];
            $lastLength = $tmpLength
          }
          $idx++
        }
        my $l = $lastElt;
    },
    TMPFOR => sub {
        my $len = length $array[0];
        my $longest = 0;
        for my $i (1 .. $#array) {
            my $i_len = length $array[$i];
            if($i_len > $len) {
                $longest = $i;
                $len = $i_len;
            }
        }
        my $l = $array[$longest];
    },
});
bvr
  • 9,687
  • 22
  • 28
  • 2
    length won't take longer for longer strings, except utf8 strings, and even there I believe it gets cached the first time. Please always show your exact benchmark code when reporting benchmarks. – ysth Nov 15 '10 at 16:10
  • @ysth: just checked, you are right. Apparently `length` is constant time, it is probably stored somewhere in SV. I supposed that C's `strlen` is used behind scenes. – bvr Nov 15 '10 at 19:08
  • no, strlen isn't used, since perl strings can contain nul characters. perl just keeps track of how much of the string buffer is used (which is equal to the length except in the case of utf8) – ysth Nov 16 '10 at 01:29
3

My fastest is:

sub drewk {
    my $len = -1;
    for (@_) {
        my $tmp=length($_);
        if ( $tmp > $len ) {
            $longest = $_;
            $len = $tmp;
        }
    }
    return $longest;
}

But benchmarking against:

sub strom {
    my $max = -1;
    my $max_i = 0;
    for (0 .. $#_) {              # for each index
        my $len = length $_[$_];  # only get length once per item
        if ($len > $max) {        # save index and update max if larger
            $max = $len;
            $max_i = $_;
        }
    }
    $_[$max_i]   # return the largest item
}

sub red {
    return reduce{ length($a) > length($b) ? $a : $b } @_;
}

Shows that reduce is fastest:

            Rate  strom  drewk reduce
strom  1323455/s     --   -38%   -45%
drewk  2144549/s    62%     --   -10%
reduce 2390707/s    81%    11%     --

The other benchmark is Eric Strom's sub

Community
  • 1
  • 1
dawg
  • 98,345
  • 23
  • 131
  • 206
  • looks like the array indexing was slowing my first version down. Surprisingly the creation of temp length lexical is a bit of a bottleneck. Take a look at my latest edit for a version that just calls length twice. – Eric Strom Nov 15 '10 at 19:19
2

A little golfish:

my @unsorted = qw( one two three four five six seven eight nine ten eleven );
my $longest =  (
    map { $_->[0] } 
    sort { $b->[1] <=> $a->[1] } 
    map { [ $_, length $_ ] } @unsorted 
)[0];

say $longest;

EDIT: the map/sort/map is a Schwartzian transform for anyone unfamiliar with that technique and wondering.

David Precious
  • 6,544
  • 1
  • 24
  • 31
  • 2
    The OP's looking for the quickest solution, so anything with a sort in it (which is O(NlogN)) should be slower than a simple linear search (O(N)) as in bvr's solution. – ishnid Nov 15 '10 at 15:34
  • 1
    A Schwartzian transform can only save time if you're sorting on a property that costs some time to compute. The length of a string is already known and costs nothing to compute. So using a Schwartzian transform in this case is futile. – gpvos Nov 16 '10 at 11:40
  • I'm pretty sure, that, at the time I answered, the question didn't state whether we were talking quickest in terms of execution time or quickest/easiest way to write. – David Precious Nov 16 '10 at 14:30
  • As for whether a Schwartzian transform could save time, yes, you're correct. I was trying to avoid the overhead of lots of calls to length(), but that is indeed cheap, and a quick benchmark shows a straight sort { length($a) <=> length($b) } to be significantly faster. Premature optmisation is indeed the root of all evil. – David Precious Nov 16 '10 at 14:36
2

Assuming the goal is to just find the longest string, not its index:

my $longest = $array[0];
my $len = length $longest;
for my $str (@array) {
    if ( length($str) > $len ) {
        $longest = $str;
        $len = length($str);
    }
}
ysth
  • 96,171
  • 6
  • 121
  • 214
1

This appears to be significantly faster than other solutions (based on fastest_Eric_Storm),

use warnings;
use 5.012;
use Benchmark qw(:all) ;
use List::Util qw(reduce);

my @array = map { ($_) x 50 } qw( one two three four five six seven eight nine ten eleven );

sub list_util_xs {
    my $l = reduce{ length($a) > length($b) ? $a : $b } @array;
    return $l;
}

sub fastest_Eric_Strom {
    my $max = -1; my $max_ref;
    for (@array) {
        if (length > $max) {
            $max = length;
            $max_ref = \$_;
        }
    }
    return $$max_ref;
}

sub ysth {
    my $longest = $array[0];
    my $len = length $longest;
    for my $str (@array) {
    if ( length($str) > $len ) {
        $longest = $str;
        $len = length($str);
    }
    }
    return $longest;
}

sub mpapec {
    my $max = -1;
    my $max_ref;
    length > $max and ($max, $max_ref) = (length, \$_) for @array;
    return $$max_ref;
}

cmpthese( -10, {
    'list_util_xs'  => sub{ list_util_xs() },
    'fastest_Eric_Storm' => sub{ fastest_Eric_Strom() },
    'ysth'      => sub{ ysth() },
    'mpapec'    => sub{ mpapec() },
});

output

                      Rate list_util_xs fastest_Eric_Storm       ysth     mpapec
list_util_xs       13479/s           --               -24%       -24%       -29%
fastest_Eric_Storm 17662/s          31%                 --        -0%        -6%
ysth               17680/s          31%                 0%         --        -6%
mpapec             18885/s          40%                 7%         7%         --
mpapec
  • 50,217
  • 8
  • 67
  • 127
1

If you really want to cut-off number of computed length's, then take a look at Schwartian transform and adopt it to your problem.

EDIT:

I see that no one posted complete example I meant, so here it is (I haven't benchmarked it yet as I'm not in from of my personal computer):

my @array = qw( one two three four five six seven eight nine ten eleven );
my $longest = (
                reduce { $a->[1] > $b->[1] ? $a : $b } 
                map { [ $_, length $_ ] }
                @array
              )[0];

say $longest;
MBO
  • 30,379
  • 5
  • 50
  • 52
  • I didn't know it had a name :) – xtofl Nov 15 '10 at 12:07
  • I already benchmarked it, and it is far more slow than what his reduce was doing on small lists. – OMG_peanuts Nov 15 '10 at 12:39
  • A Schwartzian transform can only save time if you're sorting on a property that costs some time to compute. The length of a string is already known and costs nothing to compute. So using a Schwartzian transform in this case is futile. – gpvos Nov 16 '10 at 11:38
0

You could use some temp var to avoid calculing length again and again:

my @unsorted = qw( one two three four five six seven eight nine ten eleven ); 
my $idx = 1;
my $lastLength = length $unsorted[0];
my $lastElt = $unsorted[0];
my $listLength = scalar @unsorted;
while ($idx < $listLength) {
  my $tmpLength = length $unsorted[$idx];
  if ($tmpLength > $lastLength) {
    $lastElt = $unsorted[$idx];
    $lastLength = $tmpLength
  }
  $idx++
}
print "Longest element:$lastElt";

Benchmark results :

           Rate REDUCE TMPVAR
REDUCE 169297/s     --   -29%
TMPVAR 237926/s    41%     --
OMG_peanuts
  • 1,797
  • 1
  • 13
  • 19
0

First I tested if all routines give me the right result. MBO's routine didn't pass the first test (it returns an array-reference); to give him a second change I modified the routine to get the right result.

I run the benchmark some times and I didn't always get the same order. So I would say ( as already sed here ) ysth's and fasest_Eric_Strom are the fastest but list_utils reduce is almost as fast as they are; what's easy to read from the results is that David Precious sort-version is the slowest and MBO's modified reduce-version is second slowest.

My conclusion: list_utils reduce is the winner of the best price-performance ratio.
edit: I was too fast with the prize giving ceremony: List::Util - reduce - length - encoding - question

David_Precious      64147/s             -- -36%        -73%               -79%  -80% -81%         -85%               -86% -87%
MBO                100195/s            56%   --        -58%               -67%  -69% -70%         -77%               -79% -80%
OMG_peanuts        237772/s           271% 137%          --               -21%  -27% -30%         -45%               -50% -52%
longest_Eric_Strom 300466/s           368% 200%         26%                 --   -8% -11%         -31%               -36% -40%
drewk              325883/s           408% 225%         37%                 8%    --  -4%         -25%               -31% -34%
bvr                338156/s           427% 237%         42%                13%    4%   --         -22%               -28% -32%
list_util_xs       434114/s           577% 333%         83%                44%   33%  28%           --                -8% -13%
fastest_Eric_Strom 471812/s           636% 371%         98%                57%   45%  40%           9%                 --  -5%
ysth               497198/s           675% 396%        109%                65%   53%  47%          15%                 5%   --

.

#!/usr/bin/env perl
use warnings;
use 5.012;
use Benchmark qw(:all) ;
use List::Util qw(reduce);

my @array = qw( one two three four five six seven eight nine very_long_long ten eleven );

sub list_util_xs {
    my $l = reduce{ length($a) > length($b) ? $a : $b } @array;
    return $l;
}

sub longest_Eric_Strom {
    my $max = -1; my $max_i = 0;
    for (0 .. $#array) {
        my $len = length $array[$_]; 
        if ($len > $max) { 
            $max = $len;
            $max_i = $_;
        }
    }
    return $array[$max_i];
}

sub fastest_Eric_Strom {
    my $max = -1; my $max_ref;
    for (@array) {
        if (length > $max) {
            $max = length;
            $max_ref = \$_;
        }
    }
    return $$max_ref;
}

sub David_Precious {
    my $longest =  ( map { $_->[0] } sort { $b->[1] <=> $a->[1] } map { [ $_, length $_ ] } @array )[0];
    return $longest;
}

sub MBO {
    my $longest = ( reduce { $a->[1] > $b->[1] ? $a : $b } map { [ $_, length $_ ] } @array )[0];
    return $longest->[0];
}

sub drewk {
    my $len = -1; my $longest;
    for (@array) {
        my $tmp=length($_);
        if ( $tmp > $len ) {
            $longest = $_;
            $len = $tmp;
        }
    }
    return $longest;
}

sub ysth {
    my $longest = $array[0];
    my $len = length $longest;
    for my $str (@array) {
    if ( length($str) > $len ) {
        $longest = $str;
        $len = length($str);
    }
    }
    return $longest;
}

sub bvr {
    my $len = length $array[0];
    my $longest = 0;
    for my $i (1 .. $#array) {
    my $i_len = length $array[$i];
    if($i_len > $len) {
        $longest = $i;
        $len = $i_len;
    }
    }
    return $array[$longest];
}

sub OMG_peanuts {
    my $idx = 1;
    my $lastLength = length $array[0];
    my $lastElt = $array[0];
    my $listLength = scalar @array;
    while ($idx < $listLength) {
    my $tmpLength = length $array[$idx];
    if ($tmpLength > $lastLength) {
    $lastElt = $array[$idx];
    $lastLength = $tmpLength
    }
    $idx++
    }
    return $lastElt;
}

cmpthese( -10, {
    'list_util_xs'  => sub{ list_util_xs() },
    'longest_Eric_Storm' => sub{ longest_Eric_Strom() },
    'fastest_Eric_Storm' => sub{ fastest_Eric_Strom() },
    'David_Precious'    => sub{ David_Precious() },
    'MBO'       => sub{ MBO() },
    'drewk'         => sub{ drewk() },
    'ysth'      => sub{ ysth() },
    'OMG_peanuts'   => sub{ OMG_peanuts() },
    'bvr'       => sub{ bvr() },
});
Community
  • 1
  • 1
sid_com
  • 24,137
  • 26
  • 96
  • 187
-1

You could reduce the number of times you have to calculate the string's length by reducing to a struct or array, containing the length next to the string itself.

Further, the iteration is (to be) optimized by the reduce algorithm, the length call is hardly optimizeable.

xtofl
  • 40,723
  • 12
  • 105
  • 192