3

I was just wondering about the efficiency of using a one dimensional hash (i.e. only keys, no values - we don't care about them anyway) over a one dimensional array.

The main reason I wanted to use a hash for this purpose is so I can use the exists function to see if an 'entry' already exists. Hashes are also great for not duplicating keys right? With arrays, I would need to set up my own checks involving grep which I'm led to believe would be slower.

This hash/array will then be iterated through for certain operations.

I'd love to hear any insight on this, and thanks in advance!

Bharat
  • 463
  • 1
  • 8
  • 17

4 Answers4

8
exists $hash{ $key } 

is a nice, short expression, clear and easy to use. Obviously

!!grep { $_ eq $key } @array

Is not quite as short, but

$key ~~ @array # smart match

is even shorter. So since 5.10, it just as syntactically easy to test the smart match as the exists.

So guessing from the performance difference between arrays and hashes, I can imagine that the smart match will perform faster for a small list of items, but the hash will by far outperform the array lookup with a large list of items.

However, you should Benchmark the performance anyway.


And this is why. On Strawberry perl, with even a list size 1, the hash look-up outperforms the string match:

array_lookup  577701/s           --         -46%
hash_lookup  1068376/s          85%           --

With 2 items in the list:

array_lookup  464684/s           --         -57%
hash_lookup  1068376/s         130%           --

And with 20 items:

array_lookup  181554/s           --         -83%
hash_lookup  1068376/s         488%           --

I would use the hash.

Axeman
  • 29,660
  • 2
  • 47
  • 102
  • Thanks a lot for the heads up on smart match and the benchmarks, I was wondering how large 'large' would be, and this makes it clear! – Bharat Dec 01 '10 at 17:53
5

Yes. If you want to check whether an item exists in an array, you have to loop over every item. In a hash table, you simply lookup the key. This is faster for big datasets.

Sjoerd
  • 74,049
  • 16
  • 131
  • 175
5

In a mathematical sense, hash keys are sets and arrays are tuples. For example, the tuples ('apple', 'banana') and ('banana', 'apple') are different entities, whereas the sets {'apple', 'banana'} and {'banana', 'apple'} are the same.

You should use sets when you need sets and tuples when you need tuples.

If you need to perform set operations, you might want to use Set::Object, rather than writing the operations from scratch every time.

If you are going to use hash keys to represent a set, setting the values to undef rather than 1 can reduce the memory footprint which might matter if your sets are large.

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • Thanks for the Set module suggestion, that would've made some stuff so much easier a while ago. It's easy to write my unique checks in this module because I've decided to push stuff to the hash one by one, and I'll definitely be using undef. Thanks! – Bharat Dec 01 '10 at 17:56
  • 2
    The memory footprint is no longer different as of the SV reorganizations of 5.10. – ysth Dec 02 '10 at 01:54
4

This is absolutely a standard technique in Perl. My own scripts are full of code like this:

my @elements = (...);
my %is_element = map { $_ => 1 } @elements;

There are plenty of examples on Stack Overflow, e.g. here and here.

Looking up a key in a hash is approximately O(1).

Community
  • 1
  • 1
reinierpost
  • 8,425
  • 1
  • 38
  • 70