This code is more or less how a hash works. It should explain well enough why you would want to use an array instead of a hash.
package DIYHash;
use Digest::MD5;
sub new {
my ($class, $buckets) = @_;
my $self = bless [], $class;
$#$self = $buckets || 32;
return $self;
}
sub fetch {
my ( $self, $key ) = @_;
my $i = $self->_get_bucket_index( $key );
my $bo = $self->_find_key_in_bucket($key);
return $self->[$i][$bo][1];
}
sub store {
my ( $self, $key, $value ) = @_;
my $i = $self->_get_bucket_index( $key );
my $bo = $self->_find_key_in_bucket($key);
$self->[$i][$bo] = [$key, $value];
return $value;
}
sub _find_key_in_bucket {
my ($self, $key, $index) = @_;
my $bucket = $self->[$index];
my $i = undef;
for ( 0..$#$bucket ) {
next unless $bucket->[$_][0] eq $key;
$i = $_;
}
$i = @$bucket unless defined $i;
return $i;
}
# This function needs to always return the same index for a given key.
# It can do anything as long as it always does that.
# I use the md5 hashing algorithm here.
sub _get_bucket_index {
my ( $self, $key ) = @_;
# Get a number from 0 to 1 - bucket count.
my $index = unpack( "I", md5($key) ) % @$self;
return $index;
}
1;
To use this amazing cluster of code:
my $hash = DIYHash->new(4); #This hash has 4 buckets.
$hash->store(mouse => "I like cheese");
$hash->store(cat => "I like mouse");
say $hash->fetch('mouse');
Hashes look like they are constant time, rather than order N because for a given data set, you select a number of buckets that keeps the number of items in any bucket very small.
A proper hashing system would be able to resize as appropriate when the number of collisions gets too high. You don't want to do this often, because it is an order N operation.