1

It is well known how to iterate over a hash in perl (see, e.g., What's the safest way to iterate through the keys of a Perl hash?). However, the order of the keys and values is undetermined and in fact different for each run of the perl script.

Is there a way to ensure that every run of the same perl script on the same input data results in the same iteration order? I only care for replicability in this sense---the order needs not to be predictable by a human being.

EDIT: I formulated the question in terms of iteration, but maybe it is not the iteration over the hash but the hash building process that is non-deterministic. Can I set some inititialisation to build the hash in a deterministic and replicable way?

Community
  • 1
  • 1
Sir Cornflakes
  • 675
  • 13
  • 26

1 Answers1

7

sort them first:

foreach my $key (  sort keys %hash ) { 

}

Note: Default sort is alphabetical, not numeric. But sort will take a custom function to allow you to sort in almost any order you care to name.

Alternatively, capture the ordering in an array and use that to extract the output order.

my %content_for;
my @ordered_id; 

while ( <$input_filehandle> ) { 
    my ( $id, $content ) = split; 
    push ( @ordered_id, $id ); 
    $content_for{$id} = $content; 
}

print join ( "\n", @content_for{@ordered_id} ),"\n"

;

Or something like an ordered hash mechanism like Hash::Ordered or Tie::IxHash.

I formulated the question in terms of iteration, but maybe it is not the iteration over the hash but the hash building process that is non-deterministic. Can I set some inititialisation to build the hash in a deterministic and replicable way?

No. Hashes don't work like that. See - perlsec for an explanation why. It got more random with newer versions of perl, but it was always an unordered data structure.

You can perhaps mess around with (as mentioned in the article) PERL_HASH_SEED and PERL_PERTURB_KEYS but this would definitely not be a good practice.

PERL_HASH_SEED=0 ./somescript.pl 

But you should bear in mind that hash ordering is still not guaranteed - the sequencing of keys may still change. It'll be a bit more consistent than before though. This is definitely not a good thing to use in production, or rely on for anything more than debugging.

PLEASE NOTE: The hash seed is sensitive information. Hashes are randomized to protect against local and remote attacks against Perl code. By manually setting a seed, this protection may be partially or completely lost.

Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • :) you just beat me for few seconds :) `sort` is probably the only way to achieve this. – Samiron Dec 22 '15 at 11:20
  • 1
    It's the easy answer, but there are others - like using `Hash::Ordered` which ... basically builds in a `sort`. – Sobrique Dec 22 '15 at 11:23
  • This is a good answer, but are there really no alternatives, like setting some initialisations to make the hash reproducible? – Sir Cornflakes Dec 22 '15 at 12:36
  • 2
    No. A hash is an unordered set of key-value pairs. This is because of how it's arranged in memory (for efficient random access). You can work with things that aren't hashes if you want, but you _cannot_ make magically make an unordered data structure ordered again. – Sobrique Dec 22 '15 at 12:38
  • With some digging - there is a debug mode that will let you do this to a degree. This would be a terrible idea to use in production code. Setting an environment var PERL_HASH_SEED=0 – Sobrique Dec 22 '15 at 12:54
  • 2
    [Tie::Hash::Indexed](https://metacpan.org/pod/Tie::Hash::Indexed) offers the same functionality as Tie::IxHash but is significantly faster. – ThisSuitIsBlackNot Dec 22 '15 at 15:13