3

Why does the output vary for merged hashes through several times executing this program?

use strict;
use warnings;
my %data1=(a=>'1',b=>'2',c=>'3');
my %data2=(d=>'4',e=>'5',f=>'6');
my %data3=(%data1,%data2);
while(my($key,$value)=each %data3)
{
    print "$key:$value\n";
}
  • I already verified the stack overflow link(What decides the order of keys when I print a Perl hash?), but I am still unable tp find the right solution.
  • Above code explains about the merging of hashes.
  • My question is why the hashes' output changes every time the Perl program is executed.
  • Can anyone explain the random changes in the output?
m7913d
  • 10,244
  • 7
  • 28
  • 56
Sana
  • 25
  • 7
  • The order of keys store in a hash is designed to change (for security reasons). If you want a consistent output order, just iterate over `sort keys %data3` – xxfelixxx Nov 02 '15 at 04:30
  • 2
    Have a look at http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-Attacks to learn about why the key order is randomized. – xxfelixxx Nov 02 '15 at 04:37

2 Answers2

8

Hashes in Perl have no inherent ordering, this is fundamental to how hashes work.

Data in a hash is stored in a list. Each key is run through a hash function (where it gets its name from) to find out what spot on the list it's stored. "foo" might go in slot 3, "bar" might go in slot 8. Let me give an example.

Let's say your hash stores things in a list 8 elements long (0 through 7). Our very bad hash function adds up the ASCII codes for each character in the key and then takes the modulus of the length of the list. So foo is 102 + 111 + 111 = 324. Then divide by 8 and take the remainder: 4. $hash{"foo"} = 42 is really $hash[4] = ['foo', 42].

bar is 98 + 97 + 114 = 309. 309 mod 8 is 5. $hash{"bar"} = 23 is really $hash[5] = ['bar', 23]. Perl's hash function is more involved, but you get the idea.

It's this way so you can add and remove keys very fast no matter how large the hash is. This is known as "constant time" or O(1) where the speed of the algorithm does not depend on the size of the data.

When you ask for keys %hash or each %hash, Perl will return the keys in the apparently random order they're in the internal list. This is why hashes have no particular order.

In older versions of Perl you would usually get the same ordering for the same hash each time the program was run, but this was deliberately changed for security reasons in Perl 5.8.1. Now the hash function incorporates a bit of randomness so each time a program is run you'll get a different key order. Why? Consider what happens when two keys go into the same slot.

bar and baz both hash to 5. This is called a "hash collision". They're normal, and there are various ways to deal with collisions, but they make hashes less efficient. The more collisions, the more computing power it takes to look up a key in a hash. A good hash doesn't have many collisions.

If your hash function is predictable, it's possible to create a very long list of keys that will all collide. This will make working with the hash use a lot of CPU power. This can be used to create a denial-of-service attack. The simplest is to pass in the pathological key list as HTML form fields. Most Perl programs will take the form fields and put them in a hash. So you could severely slow down a web server by crafting the right URL.

Now that Perl incorporates a bit of randomness in its hash function attackers can no longer create lists of keys which will cause collisions.


If you want order, you either have to add it yourself...

for my $key (sort { $a cmp $b } keys %data3) {
    my $value = $data3{$key};
    print "$key: $value\n";
}

...or use something which is not a hash like Hash::Ordered which will preserve the order items were added to the hash.

use Hash::Ordered;

my $data = Hash::Ordered->new(d=>'4',e=>'5',f=>'6');
$data->merge( a=>'1',b=>'2',c=>'3' );

my $iterator = $data->iterator;
while( my($key, $value) = $iterator->() ) {
    print "$key: $value\n";
}

# d: 4
# e: 5
# f: 6
# c: 3
# b: 2
# a: 1
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • For what security reasons the hashes exclude inherent ordering@Schwern – Sana Nov 02 '15 at 05:13
  • @welcomedata I've added a section on how hashes work, why they're not ordered, and why that impacts security. – Schwern Nov 02 '15 at 08:43
1

As it was mentioned in the previous answers hashes are disgned to have random ordering. There is however a perl core module (that you don't need to install) called "Tie::IxHash" which binds your hash to become a ordered hash like this:

use strict;
use warnings;
use Tie::IxHash;

tie(my %data1, "Tie::IxHash",a=>'1',b=>'2',c=>'3');
@data1{'d','e','f'} = (4,5,6);
$data1{'g'} = 7;

while(my($key,$value)=each %data1)
{
    print "$key:$value\n";
}

Also this module will preserve the order when entries are added like "Hash::Ordered" but it allows you to use normal hashes and it's a core module that ships with any perl installation. In your case would have to bind all 3 hashes of course to keep the ordering. Or else there's also a OO interface wich offers more advanced featues like "SortByKeys","SortByValues","ReOrder", etc.

Marc Elser
  • 146
  • 2
  • 12