-2

I have a dictionary that acts as a counter for different keys, i.e., the value of a key is the number of times a key occurred. Is it faster for me to use a string indexed dictionary or an integer indexed one? Which has a better performance?

Anonynags
  • 79
  • 1
  • 5

2 Answers2

2
# coding=utf-8

import sys
import timeit

print(sys.getsizeof(1000000000))
28

print(sys.getsizeof('aaaaaaa'))
56

print(timeit.timeit('{1:1}', number=10 ** 7))
0.935662218856579

print(timeit.timeit('{"1":1}', number=10 ** 7))
0.8795463330796326

print(timeit.timeit(stmt='a[1]', setup='a = {1:1}', number=10 ** 7))
0.24523148719450227

print(timeit.timeit(stmt='a["1"]',setup='a = {"1":1}', number=10 ** 7))
0.22414418170794992

print(timeit.timeit('{x*x:x for x in range(1000)}', number=1000))
0.10348407957872885

print(timeit.timeit('{"a"*x:x for x in range(1000)}', number=1000))
0.5330044677382393

ints use less memory, but strings are a tiny bit faster when it comes to assigning and accessing from a dictionary.... Unless we are filling a dictionary with strings, in that case ints are faster.

Go with what Ricardo said. I doubt there's significant difference to be made.

If you want fast, use PyPy.

Vasili Syrakis
  • 9,321
  • 1
  • 39
  • 56
  • Interesting. Is it because python's `hash()` function is faster with strings? – Doron Cohen Jun 14 '16 at 11:16
  • I consider this testing a bit unfair, since you are using strings with one character length only, meanwhile for integer you have a full 64 bits representation. And in terms of the problem posted by OP, I doubt the strings will lay on the one character length range. – Ricardo Silveira Jun 14 '16 at 11:21
  • The size of the string will increase by 1 for every extra character added, so for practical purposes you can use either for many, many keys. The int is usually always smaller though. As for why it is faster? I'm not sure. ints are their own hash codes so I don't think it is related to `hash()` but I really can't say. – Vasili Syrakis Jun 14 '16 at 11:24
  • Also, I'm making a dict with a single key:value pair. Things get different when you create a dict with thousands of key:value pairs. – Vasili Syrakis Jun 14 '16 at 11:26
  • Oh, now I checked your last line in which you add more characteres to the string. As I had answered, for string you have a map operation, right? And it will linearly depend on the size of the string, I'd say, so perhaps if you want to check it, do a plot in which Y is time it took, X is the length of the string used as key, and I assume you shall see a positive linear trend. – Ricardo Silveira Jun 14 '16 at 11:55
1

Your answer is: it makes 'almost' no difference.

Q: Why almost?

A: Strings, depending on their size, can be more time-expensive than integers, because you need the interpreter to map the string to a number, meanwhile the integer is a number itself already. But it may vary depending on the size of the string.

But it will make almost no difference to your case. As results were shown in the answer provided by Basili Syrakis.

What you need to understand is that dictionaries are based on hash tables, therefore it will cost asymptotically O(1) to return a value for a specified key. Then the type should not be a big difference for your case.

Ricardo Silveira
  • 1,193
  • 1
  • 8
  • 16