11

Say you have an array @a = qw/ a b c d/;

and a hash %a = ('a' => 1, 'b' => 1, 'c' => 1, 'd' => 1);

Is there any situation where creating the array version is better than creating the hash (other than when you have to iterate over all the values as in something like

for (@a){
    ....

in which case you would have to use keys %a if you went with the hash)? Because testing whether a specific value is in a hash is always more efficient than doing so in an array, correct?

Tyler
  • 2,579
  • 2
  • 22
  • 32
  • 4
    The hash keys have no predictive ordering, so it's quite different from an array. – leonbloy Aug 14 '14 at 17:09
  • 4
    It would be a shame if this question was closed. I think it's an excellent question with answers that would help many people. – Borodin Aug 14 '14 at 17:32
  • possible duplicate of [Why ever use an array instead of a hash?](http://stackoverflow.com/questions/24558131/why-ever-use-an-array-instead-of-a-hash) – ThisSuitIsBlackNot Aug 14 '14 at 17:37
  • 1
    @ThisSuitIsBlackNot, It's not a duplicate of the linked question. The linked question asks whether `$x{$i}` is truly faster than `$x[$i]`. (It's not.) This has nothing to do with Tyler's question. – ikegami Aug 14 '14 at 18:51
  • Another reason why you should use hash in certain cases https://stackoverflow.com/questions/54833426/fastest-way-to-check-duplicate-columns-in-a-line-in-perl/54833522#54833426 – Sam B Feb 22 '19 at 22:34

4 Answers4

10
    • Arrays are indexed by numbers.
    • Hashes are keyed by strings.
    • All indexes up to the highest index exist in an array.
    • Hashes are sparsely indexed. (e.g. "a" and "c" can exist without "b".)

There are many emergent properties. Primarily,

    • Arrays can be used to store ordered lists.
    • It would be ugly an inefficient to use hashes that way.
    • It's not possible to delete an element from an array unless it's the highest indexed element.
    • You can delete from an ordered list implemented using an array, though it is inefficient to remove elements other than the first or last.
    • It's possible to delete an element from a hash, and it's efficient.
ikegami
  • 367,544
  • 15
  • 269
  • 518
3

Arrays are ordered lists of values. They can contain duplicate values.

@array = qw(a b c a);

Hashes are a mapping between a key (which must be unique) and a value (which can be duplicated). Hashes are (effectively) unordered, which means that keys come out in apparently random order rather than the order in which they are entered.

%hash = (a => 1, b => 2, c => 3);

Hashes can also be used as sets when only the key matters. Sets are unordered and contain only unique "values" (the hash's keys).

%set = (a => undef, b => undef, c => undef);

Which one to use depends on your data and algorithm. Use an array when order matters (particularly if you can't sort to derive the order) or if duplicate values are possible. Use a set (i.e. use a hash as a set) when values must be unique and don't care about order. Use a hash when uniqueness matters, order doesn't (or is easily sortable), and look-ups are based on arbitrary values rather than integers.

You can combine arrays and hashes (via references) to create arbitrarily complex data structures.

@aoa = ([1, 2, 3], [4, 5, 6]);               # array of arrays ("2D" array)
%hoh = (a => { x => 1 }, b => { x => 2 });   # hash of hashes
@aoh = ({a => 1, b => 2}, {a => 3, b => 4}); # array of hashes
%hoa = (a => [1, 2], b => [3, 4]);           # hash of arrays
...etc.
Michael Carman
  • 30,628
  • 10
  • 74
  • 122
  • 1
    Re "Arrays are ordered lists of values." No more so than hashes. e.g. `$a[2] = "a"; $a[0] = "b"; $a[1] = "c"; print values(@a);` won't give `abc` any more than `$h{2} = "a"; $h{0} = "b"; $h{1} = "c"; print values(%h);` will. The values are merely keyed, and you can sort the keys for both arrays and hashes. The actually difference is that *it's more efficient to sort an array's keys*. (Before you mention `push` and `pop`, keep in mind that you could as easily make `push` and `pop` that works for hashes.) – ikegami Aug 14 '14 at 18:54
  • 1
    @ikegami: Of course arrays are ordered. Your `@a` will always print as `bca`. They're ordered by index. I never said that their ordering is based on *insertion*. – Michael Carman Aug 14 '14 at 19:05
  • Either that repeats what I already said (you can sort the indexes), or you're saying that hashes are ordered too (by bucket index). (Don't forget that that at the heart of a hash is an array. The word "hash" comes from the fact that the key is hashed into a number that's used as an index into that array.) – ikegami Aug 14 '14 at 19:10
  • 1
    Yes, hashes are ordered, but that order is hidden from the programmer and subject to change without notice. As of Perl 5.18 the hash order is randomized and won't be the same between two executions of the same program. That's why I said that hashes are *effectively* unordered. – Michael Carman Aug 14 '14 at 19:18
  • 1
    Arrays, on the other hand, are defined to be ordered in perldata: "...arrays are ordered lists of scalars...". That order is guaranteed (without sorting). I'm a bit baffled at why you seem to be arguing otherwise. – Michael Carman Aug 14 '14 at 19:26
  • All I'm saying is that you can get the elements in key order from both hashes and arrays. – ikegami Aug 15 '14 at 02:58
2

This about using numbers as hash keys. It doesn't answer the question directly as it doesn't compare the facilities that arrays provide, but I thought it would be a good place to put the information.

Suppose a hash with ten elements is built using code like this

use strict;
use warnings;

my %hash;
my $n = 1000;
for (1 .. 10) {
  $hash{$n} = 1;
  $n *= 1000;
}

and then we query it, looking for keys that are powers of ten. Of course the easiest way to multiply an integer by ten is to add a zero, so it is fine to write

my $m = '1';

for (1 .. 100) {
  print $m, "\n" if $hash{$m};
  $m .= 0;
}

which has the output

1000
1000000
1000000000
1000000000000
1000000000000000
1000000000000000000

We entered ten elements but this shows only six. What has happened? Let's take a look at what's in the hash.

use Data::Dump;
dd \%hash;

and this outputs

{
  "1000"                => 1,
  "1000000"             => 1,
  "1000000000"          => 1,
  "1000000000000"       => 1,
  "1000000000000000"    => 1,
  "1000000000000000000" => 1,
  "1e+021"              => 1,
  "1e+024"              => 1,
  "1e+027"              => 1,
  "1e+030"              => 1,
}

so the hash doesn't use the keys that we imagined. It stringifies the numbers in a way that it would be foolish to try to emulate.

For a slightly more practical example, say we had some circles and wanted to collect into sets by area. The obvious thing is to use the area as a hash key, like this program which creates 100,000 circles with random integer diameters up to 18 million.

use strict;
use warnings;
use 5.010;

package Circle;

use Math::Trig 'pi';

sub new {
  my $class = shift;
  my $self = { radius => shift };
  bless $self, $class;
}

sub area {
  my $self = shift;
  my $radius = $self->{radius};
  pi * $radius * $radius;
}



package main;

my %circles;

for (1 .. 100_000) {
   my $circle = Circle->new(int rand 18_000_000);
   push @{ $circles{$circle->area} }, $circle;
}

Now let's see how many of those hash keys use scientific notation

say scalar grep /e/, keys %circles;

which says (randomly, of course)

861

so there really isn't a tidy way of know what string perl will use if we specify a number as a hash index.

ikegami
  • 367,544
  • 15
  • 269
  • 518
Borodin
  • 126,100
  • 9
  • 70
  • 144
1

In Perl an @array is an ordered list of values ($v1, $v2, ...) accessed by an integer (both positive and negative), while a %hash is an unordered list of 'key => value' pairs (k1 => $v1, k2 => $v2, ...) accessed by a string.

There are modules on CPAN that implement ordered hashes, like: Hash::Ordered and Tie::IxHash

You might want to use an array when you have ordered 'items' presumably a great number as well, for which using a %hash and sorting the keys and/or the values would be inefficient.