0

The following code produces inconsistent results if executed multiple times. I ran it on Debian 8.4 (jessie) with Python-3.5.1.

from heapq import nlargest
from operator import itemgetter

dd = dict([('41', 768.0), ('2', 15275.0), ('9', 1728.0), ('90', 1728.0),
           ('97', 1200.0), ('68', 2904.0), ('98', 4380.0), ('16', 768.0),
           ('37', 768.0), ('17', 1587.0), ('25', 4495.4)])

print(nlargest(5, dd.items(), key=itemgetter(1)))

outputs after multiple execution:

[('2', 15275.0), ('25', 4495.4), ('98', 4380.0), ('68', 2904.0), ('90', 1728.0)]
[('2', 15275.0), ('25', 4495.4), ('98', 4380.0), ('68', 2904.0), ('9', 1728.0)]
[('2', 15275.0), ('25', 4495.4), ('98', 4380.0), ('68', 2904.0), ('90', 1728.0)]
[('2', 15275.0), ('25', 4495.4), ('98', 4380.0), ('68', 2904.0), ('90', 1728.0)]

It looks random, can anybody explain why this is happening?

if replace dd.items() with sorted(dd.items()), then the output becomes deterministic. i.e.

[('2', 15275.0), ('25', 4495.4), ('98', 4380.0), ('68', 2904.0), ('9', 1728.0)]

I also tried above code on OSX and CentOS6.7 with python-2.7, and it always returns

[('2', 15275.0), ('25', 4495.4), ('98', 4380.0), ('68', 2904.0), ('9', 1728.0)]

Can this be a bug in Python3 heapq implementation?

zyxue
  • 7,904
  • 5
  • 48
  • 74

1 Answers1

3

You are asking for the 5 largest items, and there is a tie for 5th place. How Python breaks the tie is arbitrary, and any nondeterministic behavior involved is not a bug.

The reason it's nondeterministic is because dict iteration order is arbitrary, and since Python 3.3, hash randomization for strings and certain other types has been enabled by default. Thus, there is a random component to the order in which heapq.nlargest receives dict items, and the way it breaks ties depends on the order in which it sees items.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • The key idea seems to be `since Python 3.3, hash randomization for strings and certain other types has been enabled by default`, that's why I wonder this randomness doesn't happen in Python 2.7 – zyxue May 24 '16 at 18:05