33

I was coding a Euler problem, and I ran into question that sparked my curiosity. I have two snippets of code. One is with lists the other uses dictionaries.

using lists:

n=100000
num=[]
suma=0
for i in range(n,1,-1):
    tmp=tuple(set([n for n in factors(i)]))
    if len(tmp) != 2: continue
    if tmp not in num:
       num.append(tmp)
           suma+=i

using dictionaries:

n=100000
num={}
suma=0
for i in range(n,1,-1):
   tmp=tuple(set([n for n in factors(i)]))
   if len(tmp) != 2: continue
   if tmp not in num:
      num[tmp]=i
      suma+=i

I am only concerned about performance. Why does the second example using dictionaries run incredibly fast, faster than the first example with lists. the example with dictionaries runs almost thirty-fold faster!

I tested these 2 code using n=1000000, and the first code run in 1032 seconds and the second one run in just 3.3 second,,, amazin'!

user
  • 1,220
  • 1
  • 12
  • 31
froycard
  • 365
  • 2
  • 4
  • 10

4 Answers4

46

In Python, the average time complexity of a dictionary key lookup is O(1), since they are implemented as hash tables. The time complexity of lookup in a list is O(n) on average. In your code, this makes a difference in the line if tmp not in num:, since in the list case, Python needs to search through the whole list to detect membership, whereas in the dict case it does not except for the absolute worst case.

For more details, check out TimeComplexity.

Karin
  • 8,404
  • 25
  • 34
  • many thanks, your comment has just pointed me to the right direction, those tables on TimeComplexity come in handy and must be taken into account when I try to speed things up in my code. Thanks again – froycard Aug 13 '16 at 00:23
  • For small numbers of items, a list may be faster due to constant factors of dict hashing. – qwr Aug 10 '23 at 03:41
5

If it's about speed, you should not create any lists:

n = 100000
factors = ((frozenset(factors(i)), i) for i in range(2, n+1))
num = {k:v for k,v in factors if len(k)==2}
suma = sum(num.values())
Daniel
  • 42,087
  • 4
  • 55
  • 81
1

In a list, the code if tmp not in num: is O(n), while it is O(lgn) in dict.

Edit: The dict is based on hashing, so it is much quicker than liner list search. Thanks @user2357112 for point this.

citaret
  • 416
  • 4
  • 11
  • so, is this it? if so (I wonder why) this tempts me to use dictionary instead of list, for performance concerning I am just trying to find a better way to speed up my coding and this just blew out my mind... – froycard Aug 13 '16 at 00:06
  • 1
    This is false. Dicts are based on hashing, not comparisons. Their lookup performance is not O(log(n)). – user2357112 Aug 13 '16 at 00:10
1

I am almost positive that the "magic sauce" using a dictionary lies in the fact that the dictionary is comprised of key->value pairs.

in a list, youre dealing with arrays, which means the for loop has to start at index 0 inside of your list in order to loop through every record.

the dictionary just has to find the key->value pair in question on the first 'go-round' and return it, hence the speed...

basically, testing for membership in a set of key->value pairs is a lot quicker than searching an entire list for a value. the larger your list gets the slower it will be... but this isnt always the case, there are scenarios where a list will be faster... but i believe this may be the answer youre looking for

lopezdp
  • 1,538
  • 3
  • 21
  • 38
  • Thanks... I guess I need to continue researching about this,,, first time I run into this... I would like to know where I could find more information about this specific situation? – froycard Aug 13 '16 at 00:09
  • i was wondering about this same thing last week and had this link saved. I guess it was meant for you bud: https://wiki.python.org/moin/PythonSpeed – lopezdp Aug 13 '16 at 00:10