The fastest way (execution time) to find the longest element in an list

Question

Is this the fastest (execution time) way to find the longest element in a list?

#!/usr/bin/env perl
use warnings;
use 5.012;
use List::Util qw(reduce);
use List::Util::XS;

my @array = qw( one two three four five six seven eight nine ten eleven );

my $l = reduce{ length($a) > length($b) ? $a : $b } @array;

say $l;

It's unnecessary to `use List::Util::XS` unless your program has to die unless the XS version is installed. `use List::Util` automatically loads the XS version if available. — cjm, Nov 15 '10 at 07:17
By "fastest", are you referring to execution time or coding time? — outis, Nov 15 '10 at 07:47
Are you trying to optimise your application? Did a profiler actually tell you this one line in your application was taking up a significant amount of the processing time? — rafl, Nov 15 '10 at 08:39
@ cjm: Added "use List::Util::XS" to prevent the question: depends if List::Util::XS is available or not. — sid_com, Nov 15 '10 at 09:19
@rafl: I didn't use a profiler. But it's no harm to know which way is the fastest. — sid_com, Nov 15 '10 at 09:22
At least you should be using > instead of gt, or you'll get incorrect results. — gpvos, Nov 15 '10 at 09:40
Sounds like premature optimization to me. How big is your list? What’s the real application like? — tchrist, Nov 15 '10 at 10:02
It's for a terminal-user-interface. I am trying to adapt Term::Clui to my preferences. List-length - lets say less then 10_000 - my cpan_module.pl shows me 5_001 entries when I search for Perl (I know it makes not much sense searching for Perl); but it uses Term::ReadLine and Term::Pager which I think its good for very long lists; for long lists I prefer Term::UI so lets not say less then 10_000 but lets say less then 150. — sid_com, Nov 15 '10 at 15:11
Now after a time of reflection I feel a little guilty about micro-optimization. But I thought it's an user-interface and it doesn't cost money. I didn't think there would be a sensible slowdown but I thought this time a little here and next time a little there and another next time maybe a new loop around and so on. — sid_com, Nov 15 '10 at 15:15
I tried to benchmark four different routines, but the benchmark didn't stop anymore. I use the Benchmark-module and as count-argument I used 0. Could someone tell me how long such a benchmark runs normally? — sid_com, Nov 15 '10 at 15:16
The answer is eleven, you don't need a computer to work this out — Tom Gullen, Nov 15 '10 at 16:52

Eric Strom · Accepted Answer · 2010-11-15T19:17:28.543

When only trying to find one element of a list, there is no need to construct an N sized data structure as many answers here have done. The fastest O(N) way to do this is to walk the array, keeping track of the largest element. That way you have O(N) accesses of the list, and O(1) memory usage.

sub longest {
    my $max = -1;
    my $max_i = 0;
    for (0 .. $#_) {              # for each index
        my $len = length $_[$_];  # only get length once per item
        if ($len > $max) {        # save index and update max if larger
            $max = $len;
            $max_i = $_;
        }
    }
    $_[$max_i]   # return the largest item
}

If you are going to be running the above code many times, I would suggest inlining the body of the subroutine.

EDIT:

drewk's benchmark revealed that the array index in the above code is a bit of a bottleneck. Experimenting a little more, I have finally found a method that is faster than the reduce solution:

sub fastest {
    my $max = -1;
    my $max_ref;
    for (@_) {
        if (length > $max) {  # no temp variable, length() twice is faster
            $max = length;
            $max_ref = \$_;   # avoid any copying
        }
    }
    $$max_ref
}

which results in the following benchmark:

           Rate longest   drewk  reduce fastest
longest 44245/s      --    -21%    -30%    -47%
drewk   55854/s     26%      --    -11%    -33%
reduce  63014/s     42%     13%      --    -25%
fastest 83638/s     89%     50%     33%      --

I think that the second call to `length` with the same value of the implicit `$_` is probably cached from the first call in your `fastest`, no? — dawg, Nov 15 '10 at 20:00
@drewk => I am not exactly sure how the optimization is playing out. From ysth's comments, it seems like Perl maintains a field in the SV that stores the length. So my guess is that even out of order lookups will all be constant time once the value has been calculated once. In general, its always good to benchmark to see if repeated calls to a builtin are faster than making a lexical copy. I have tested `caller`, `wantarray`, `ref` and and others. Most of the time it is much faster to just call them repeatedly. You need 30+ calls before the lexical will give you any advantage. — Eric Strom, Nov 15 '10 at 20:49
Sometimes people call this the "high water mark" approach. It's what we do in _Learning Perl_. — brian d foy, Jun 13 '11 at 16:24

bvr · Answer 2 · 2010-11-16T08:34:37.753

Here is slightly modified version of OMG_peanuts with for and less variables:

my $len = length $array[0];
my $longest = 0;
for my $i (1 .. $#array) {
    my $i_len = length $array[$i];
    if($i_len > $len) {
        $longest = $i;
        $len = $i_len;
    }
}
my $l = $array[$longest];

I was playing a bit with benchmarks, getting this for small numbers (original array)

           Rate REDUCE TMPVAR TMPFOR
REDUCE 234862/s     --    -0%    -7%
TMPVAR 235643/s     0%     --    -6%
TMPFOR 251326/s     7%     7%     --

and this for larger number or items (original array x 100)

         Rate TMPVAR TMPFOR REDUCE
TMPVAR 3242/s     --   -28%   -32%
TMPFOR 4503/s    39%     --    -5%
REDUCE 4750/s    47%     5%     --

Note that suitability of algorithm heavily varies due to data specifics (I would guess longer strings may increase weight of length function in algorithm).

EDIT: Here is full code for the benchmark (long array version, short is missing x 100 in array definition)

use Benchmark  qw(:all);
use List::Util qw(reduce);

my @array = qw( one two three four five six seven eight nine ten eleven ) x 100;

cmpthese(-2, {
    REDUCE => sub {
        my $l = reduce{ length($a) gt length($b) ? $a : $b } @array;
    },
    TMPVAR => sub {
        my $idx = 1;
        my $lastLength = length $array[0];
        my $lastElt = $array[0];
        my $listLength = scalar @array;
        while ($idx < $listLength) {
          my $tmpLength = length $array[$idx];
          if ($tmpLength > $lastLength) {
            $lastElt = $array[$idx];
            $lastLength = $tmpLength
          }
          $idx++
        }
        my $l = $lastElt;
    },
    TMPFOR => sub {
        my $len = length $array[0];
        my $longest = 0;
        for my $i (1 .. $#array) {
            my $i_len = length $array[$i];
            if($i_len > $len) {
                $longest = $i;
                $len = $i_len;
            }
        }
        my $l = $array[$longest];
    },
});

length won't take longer for longer strings, except utf8 strings, and even there I believe it gets cached the first time. Please always show your exact benchmark code when reporting benchmarks. — ysth, Nov 15 '10 at 16:10
@ysth: just checked, you are right. Apparently `length` is constant time, it is probably stored somewhere in SV. I supposed that C's `strlen` is used behind scenes. — bvr, Nov 15 '10 at 19:08
no, strlen isn't used, since perl strings can contain nul characters. perl just keeps track of how much of the string buffer is used (which is equal to the length except in the case of utf8) — ysth, Nov 16 '10 at 01:29

score 3 · Answer 3 · edited May 23 '17 at 12:22

My fastest is:

sub drewk {
    my $len = -1;
    for (@_) {
        my $tmp=length($_);
        if ( $tmp > $len ) {
            $longest = $_;
            $len = $tmp;
        }
    }
    return $longest;
}

But benchmarking against:

sub strom {
    my $max = -1;
    my $max_i = 0;
    for (0 .. $#_) {              # for each index
        my $len = length $_[$_];  # only get length once per item
        if ($len > $max) {        # save index and update max if larger
            $max = $len;
            $max_i = $_;
        }
    }
    $_[$max_i]   # return the largest item
}

sub red {
    return reduce{ length($a) > length($b) ? $a : $b } @_;
}

Shows that reduce is fastest:

            Rate  strom  drewk reduce
strom  1323455/s     --   -38%   -45%
drewk  2144549/s    62%     --   -10%
reduce 2390707/s    81%    11%     --

The other benchmark is Eric Strom's sub

looks like the array indexing was slowing my first version down. Surprisingly the creation of temp length lexical is a bit of a bottleneck. Take a look at my latest edit for a version that just calls length twice. — Eric Strom, Nov 15 '10 at 19:19

score 2 · Answer 4 · answered Nov 15 '10 at 13:49

2

A little golfish:

my @unsorted = qw( one two three four five six seven eight nine ten eleven );
my $longest =  (
    map { $_->[0] } 
    sort { $b->[1] <=> $a->[1] } 
    map { [ $_, length $_ ] } @unsorted 
)[0];

say $longest;

EDIT: the map/sort/map is a Schwartzian transform for anyone unfamiliar with that technique and wondering.

answered Nov 15 '10 at 13:49

David Precious

6,544
1
24
31

2

The OP's looking for the quickest solution, so anything with a sort in it (which is O(NlogN)) should be slower than a simple linear search (O(N)) as in bvr's solution. – ishnid Nov 15 '10 at 15:34
1

A Schwartzian transform can only save time if you're sorting on a property that costs some time to compute. The length of a string is already known and costs nothing to compute. So using a Schwartzian transform in this case is futile. – gpvos Nov 16 '10 at 11:40
I'm pretty sure, that, at the time I answered, the question didn't state whether we were talking quickest in terms of execution time or quickest/easiest way to write. – David Precious Nov 16 '10 at 14:30
As for whether a Schwartzian transform could save time, yes, you're correct. I was trying to avoid the overhead of lots of calls to length(), but that is indeed cheap, and a quick benchmark shows a straight sort { length($a) <=> length($b) } to be significantly faster. Premature optmisation is indeed the root of all evil. – David Precious Nov 16 '10 at 14:36

score 2 · Answer 5 · answered Nov 15 '10 at 16:17

2

Assuming the goal is to just find the longest string, not its index:

my $longest = $array[0];
my $len = length $longest;
for my $str (@array) {
    if ( length($str) > $len ) {
        $longest = $str;
        $len = length($str);
    }
}

answered Nov 15 '10 at 16:17

ysth

96,171
6
121
214

score 1 · Answer 6 · answered Dec 09 '14 at 20:17

This appears to be significantly faster than other solutions (based on fastest_Eric_Storm),

use warnings;
use 5.012;
use Benchmark qw(:all) ;
use List::Util qw(reduce);

my @array = map { ($_) x 50 } qw( one two three four five six seven eight nine ten eleven );

sub list_util_xs {
    my $l = reduce{ length($a) > length($b) ? $a : $b } @array;
    return $l;
}

sub fastest_Eric_Strom {
    my $max = -1; my $max_ref;
    for (@array) {
        if (length > $max) {
            $max = length;
            $max_ref = \$_;
        }
    }
    return $$max_ref;
}

sub ysth {
    my $longest = $array[0];
    my $len = length $longest;
    for my $str (@array) {
    if ( length($str) > $len ) {
        $longest = $str;
        $len = length($str);
    }
    }
    return $longest;
}

sub mpapec {
    my $max = -1;
    my $max_ref;
    length > $max and ($max, $max_ref) = (length, \$_) for @array;
    return $$max_ref;
}

cmpthese( -10, {
    'list_util_xs'  => sub{ list_util_xs() },
    'fastest_Eric_Storm' => sub{ fastest_Eric_Strom() },
    'ysth'      => sub{ ysth() },
    'mpapec'    => sub{ mpapec() },
});

output

                      Rate list_util_xs fastest_Eric_Storm       ysth     mpapec
list_util_xs       13479/s           --               -24%       -24%       -29%
fastest_Eric_Storm 17662/s          31%                 --        -0%        -6%
ysth               17680/s          31%                 0%         --        -6%
mpapec             18885/s          40%                 7%         7%         --

MBO · Answer 7 · 2010-11-15T16:33:00.267

1

If you really want to cut-off number of computed length's, then take a look at Schwartian transform and adopt it to your problem.

EDIT:

I see that no one posted complete example I meant, so here it is (I haven't benchmarked it yet as I'm not in from of my personal computer):

my @array = qw( one two three four five six seven eight nine ten eleven );
my $longest = (
                reduce { $a->[1] > $b->[1] ? $a : $b } 
                map { [ $_, length $_ ] }
                @array
              )[0];

say $longest;

edited Nov 15 '10 at 16:33

answered Nov 15 '10 at 11:05

MBO

30,379
5
50
52

I didn't know it had a name :) – xtofl Nov 15 '10 at 12:07
I already benchmarked it, and it is far more slow than what his reduce was doing on small lists. – OMG_peanuts Nov 15 '10 at 12:39
A Schwartzian transform can only save time if you're sorting on a property that costs some time to compute. The length of a string is already known and costs nothing to compute. So using a Schwartzian transform in this case is futile. – gpvos Nov 16 '10 at 11:38

OMG_peanuts · Answer 8 · 2010-11-15T10:32:35.923

You could use some temp var to avoid calculing length again and again:

my @unsorted = qw( one two three four five six seven eight nine ten eleven ); 
my $idx = 1;
my $lastLength = length $unsorted[0];
my $lastElt = $unsorted[0];
my $listLength = scalar @unsorted;
while ($idx < $listLength) {
  my $tmpLength = length $unsorted[$idx];
  if ($tmpLength > $lastLength) {
    $lastElt = $unsorted[$idx];
    $lastLength = $tmpLength
  }
  $idx++
}
print "Longest element:$lastElt";

Benchmark results :

           Rate REDUCE TMPVAR
REDUCE 169297/s     --   -29%
TMPVAR 237926/s    41%     --

score 0 · Answer 9 · edited May 23 '17 at 12:08

First I tested if all routines give me the right result. MBO's routine didn't pass the first test (it returns an array-reference); to give him a second change I modified the routine to get the right result.

I run the benchmark some times and I didn't always get the same order. So I would say ( as already sed here ) ysth's and fasest_Eric_Strom are the fastest but list_utils reduce is almost as fast as they are; what's easy to read from the results is that David Precious sort-version is the slowest and MBO's modified reduce-version is second slowest.

My conclusion: list_utils reduce is the winner of the best price-performance ratio.
edit: I was too fast with the prize giving ceremony: List::Util - reduce - length - encoding - question

David_Precious      64147/s             -- -36%        -73%               -79%  -80% -81%         -85%               -86% -87%
MBO                100195/s            56%   --        -58%               -67%  -69% -70%         -77%               -79% -80%
OMG_peanuts        237772/s           271% 137%          --               -21%  -27% -30%         -45%               -50% -52%
longest_Eric_Strom 300466/s           368% 200%         26%                 --   -8% -11%         -31%               -36% -40%
drewk              325883/s           408% 225%         37%                 8%    --  -4%         -25%               -31% -34%
bvr                338156/s           427% 237%         42%                13%    4%   --         -22%               -28% -32%
list_util_xs       434114/s           577% 333%         83%                44%   33%  28%           --                -8% -13%
fastest_Eric_Strom 471812/s           636% 371%         98%                57%   45%  40%           9%                 --  -5%
ysth               497198/s           675% 396%        109%                65%   53%  47%          15%                 5%   --

.

#!/usr/bin/env perl
use warnings;
use 5.012;
use Benchmark qw(:all) ;
use List::Util qw(reduce);

my @array = qw( one two three four five six seven eight nine very_long_long ten eleven );

sub list_util_xs {
    my $l = reduce{ length($a) > length($b) ? $a : $b } @array;
    return $l;
}

sub longest_Eric_Strom {
    my $max = -1; my $max_i = 0;
    for (0 .. $#array) {
        my $len = length $array[$_]; 
        if ($len > $max) { 
            $max = $len;
            $max_i = $_;
        }
    }
    return $array[$max_i];
}

sub fastest_Eric_Strom {
    my $max = -1; my $max_ref;
    for (@array) {
        if (length > $max) {
            $max = length;
            $max_ref = \$_;
        }
    }
    return $$max_ref;
}

sub David_Precious {
    my $longest =  ( map { $_->[0] } sort { $b->[1] <=> $a->[1] } map { [ $_, length $_ ] } @array )[0];
    return $longest;
}

sub MBO {
    my $longest = ( reduce { $a->[1] > $b->[1] ? $a : $b } map { [ $_, length $_ ] } @array )[0];
    return $longest->[0];
}

sub drewk {
    my $len = -1; my $longest;
    for (@array) {
        my $tmp=length($_);
        if ( $tmp > $len ) {
            $longest = $_;
            $len = $tmp;
        }
    }
    return $longest;
}

sub ysth {
    my $longest = $array[0];
    my $len = length $longest;
    for my $str (@array) {
    if ( length($str) > $len ) {
        $longest = $str;
        $len = length($str);
    }
    }
    return $longest;
}

sub bvr {
    my $len = length $array[0];
    my $longest = 0;
    for my $i (1 .. $#array) {
    my $i_len = length $array[$i];
    if($i_len > $len) {
        $longest = $i;
        $len = $i_len;
    }
    }
    return $array[$longest];
}

sub OMG_peanuts {
    my $idx = 1;
    my $lastLength = length $array[0];
    my $lastElt = $array[0];
    my $listLength = scalar @array;
    while ($idx < $listLength) {
    my $tmpLength = length $array[$idx];
    if ($tmpLength > $lastLength) {
    $lastElt = $array[$idx];
    $lastLength = $tmpLength
    }
    $idx++
    }
    return $lastElt;
}

cmpthese( -10, {
    'list_util_xs'  => sub{ list_util_xs() },
    'longest_Eric_Storm' => sub{ longest_Eric_Strom() },
    'fastest_Eric_Storm' => sub{ fastest_Eric_Strom() },
    'David_Precious'    => sub{ David_Precious() },
    'MBO'       => sub{ MBO() },
    'drewk'         => sub{ drewk() },
    'ysth'      => sub{ ysth() },
    'OMG_peanuts'   => sub{ OMG_peanuts() },
    'bvr'       => sub{ bvr() },
});

The distance is to small - in some (less often) runs fastest_Eric_Storm was the fastest. — sid_com, Nov 17 '10 at 08:57

xtofl · Answer 10 · 2010-11-15T12:08:16.193

-1

You could reduce the number of times you have to calculate the string's length by reducing to a struct or array, containing the length next to the string itself.

Further, the iteration is (to be) optimized by the reduce algorithm, the length call is hardly optimizeable.

edited Nov 15 '10 at 12:08

answered Nov 15 '10 at 10:45

xtofl

40,723
12
105
192

The fastest way (execution time) to find the longest element in an list

10 Answers10

Linked

Related