Why ever use an array instead of a hash?

Question

I have read that it is much faster to iterate through a hash than through an array. Retrieving values from a hash is also much faster.

Instead of using an array, why not just use a hash and give each key a value corresponding to an index? If the items ever need to be in order, they can be sorted.

@RobEarl .. your rhetorical comment is actually an answer - well i would vote it up as such if it had a few more details :) — G. Cito, Jul 03 '14 at 16:22
Have you tried [benchmarking](http://perldoc.perl.org/Benchmark.html) it yourself? If you did, you would find that iterating through an array is actually much faster than iterating through a hash. — ThisSuitIsBlackNot, Jul 03 '14 at 16:53

mpapec · Accepted Answer · 2014-07-03T16:00:05.370

10

Retrieving from hash is faster in a sense that you can fetch value directly by key instead of iterating over whole hash (or array when you're searching for particular string). Having that said, $hash{key} isn't faster than $array[0] as no iteration is taking place.

Arrays can't be replaced by hashes, as they have different features,

                       arrays hashes
------------------------------------
ordered keys            x       -
push/pop                x       -
suitable for looping    x       -
named keys              -       x

edited Jul 03 '14 at 16:00

answered Jul 03 '14 at 15:54

mpapec

50,217
8
67
127

I didn't think of push/pop – Sam Weisenthal Jul 03 '14 at 17:27
3

No idea what "suitable for looping" means. You can loop over the elements of a hash just as easily as you can loop over the elements of an array. – ikegami Jul 04 '14 at 05:26
1

`$hash{key}` is slower than `$array[0]`, but they scale the same. – ikegami Jul 04 '14 at 05:31
1

`for @r` or `for 0..$#r` are not generating lists, unlike `keys %h`, and `each` requires hash reset. – mpapec Jul 04 '14 at 05:35

score 4 · Answer 2 · edited May 23 '17 at 12:11

I don't know where you read that hashes are faster than arrays. According to some Perl reference works (Mastering Algorithms with Perl), arrays are faster than hashes (follow this link for some more info).

If speed is your only criterae, you should benchmark to see which technique is going to be faster. It depends on what operations you will be doing onto the array/hash.

Here is an SO link with some further information: Advantage of 'one dimensional' hash over array in Perl

score 1 · Answer 3 · answered Jul 03 '14 at 16:35

I think this is a good question: it's not so much a high level "language design" query so much as it is an implementation question. It could be worded in a way that emphasizes that - say using hashes versus arrays for a particular technique or use case.

Hashes are nice but you need lists/arrays (c.f. @RobEarl). You can use tie (or modules like Tie::IxHash or Tie::Hash::Indexed ) to "preserve" the order of a hash, but I believe these would have to be slower than a regular hash and in some cases you can't pass them around or copy them in quite the same way.

daotoad · Answer 4 · 2014-07-04T00:52:12.797

This code is more or less how a hash works. It should explain well enough why you would want to use an array instead of a hash.

package DIYHash;
use Digest::MD5;

sub new {
     my ($class, $buckets) = @_;

     my $self = bless [], $class;
     $#$self = $buckets || 32;

     return $self;
}

sub fetch {
    my ( $self, $key ) = @_;

    my $i  = $self->_get_bucket_index( $key );
    my $bo = $self->_find_key_in_bucket($key);

    return $self->[$i][$bo][1];
}

sub store {
    my ( $self, $key, $value ) = @_;

    my $i  = $self->_get_bucket_index( $key );
    my $bo = $self->_find_key_in_bucket($key);
    $self->[$i][$bo] = [$key, $value];

    return $value;
}

sub _find_key_in_bucket {
    my ($self, $key, $index) = @_;

    my $bucket = $self->[$index];

    my $i = undef; 
    for ( 0..$#$bucket ) {
         next unless $bucket->[$_][0] eq $key;
         $i = $_;
    }
    $i = @$bucket unless defined $i;

    return $i;
}


# This function needs to always return the same index for a given key.
# It can do anything as long as it always does that.
# I use the md5 hashing algorithm here.
sub _get_bucket_index {
    my ( $self, $key ) = @_;

    # Get a number from 0 to 1 - bucket count.
    my $index = unpack( "I", md5($key) ) % @$self;

    return $index;
}

1;

To use this amazing cluster of code:

my $hash = DIYHash->new(4);  #This hash has 4 buckets.
$hash->store(mouse => "I like cheese");
$hash->store(cat => "I like mouse");

say $hash->fetch('mouse');

Hashes look like they are constant time, rather than order N because for a given data set, you select a number of buckets that keeps the number of items in any bucket very small.

A proper hashing system would be able to resize as appropriate when the number of collisions gets too high. You don't want to do this often, because it is an order N operation.

Why ever use an array instead of a hash?

4 Answers4

Linked