I have a dictionary that acts as a counter for different keys, i.e., the value of a key is the number of times a key occurred. Is it faster for me to use a string indexed dictionary or an integer indexed one? Which has a better performance?
-
4Have you thought about _measuring it_? – Łukasz Rogalski Jun 14 '16 at 11:03
-
3How much data do you expect? Will performance really matter here? My instinct would be to go with whatever is easiest to program. – Matthew Jun 14 '16 at 11:05
2 Answers
# coding=utf-8
import sys
import timeit
print(sys.getsizeof(1000000000))
28
print(sys.getsizeof('aaaaaaa'))
56
print(timeit.timeit('{1:1}', number=10 ** 7))
0.935662218856579
print(timeit.timeit('{"1":1}', number=10 ** 7))
0.8795463330796326
print(timeit.timeit(stmt='a[1]', setup='a = {1:1}', number=10 ** 7))
0.24523148719450227
print(timeit.timeit(stmt='a["1"]',setup='a = {"1":1}', number=10 ** 7))
0.22414418170794992
print(timeit.timeit('{x*x:x for x in range(1000)}', number=1000))
0.10348407957872885
print(timeit.timeit('{"a"*x:x for x in range(1000)}', number=1000))
0.5330044677382393
ints use less memory, but strings are a tiny bit faster when it comes to assigning and accessing from a dictionary.... Unless we are filling a dictionary with strings, in that case ints are faster.
Go with what Ricardo said. I doubt there's significant difference to be made.
If you want fast, use PyPy.

- 9,321
- 1
- 39
- 56
-
Interesting. Is it because python's `hash()` function is faster with strings? – Doron Cohen Jun 14 '16 at 11:16
-
I consider this testing a bit unfair, since you are using strings with one character length only, meanwhile for integer you have a full 64 bits representation. And in terms of the problem posted by OP, I doubt the strings will lay on the one character length range. – Ricardo Silveira Jun 14 '16 at 11:21
-
The size of the string will increase by 1 for every extra character added, so for practical purposes you can use either for many, many keys. The int is usually always smaller though. As for why it is faster? I'm not sure. ints are their own hash codes so I don't think it is related to `hash()` but I really can't say. – Vasili Syrakis Jun 14 '16 at 11:24
-
Also, I'm making a dict with a single key:value pair. Things get different when you create a dict with thousands of key:value pairs. – Vasili Syrakis Jun 14 '16 at 11:26
-
Oh, now I checked your last line in which you add more characteres to the string. As I had answered, for string you have a map operation, right? And it will linearly depend on the size of the string, I'd say, so perhaps if you want to check it, do a plot in which Y is time it took, X is the length of the string used as key, and I assume you shall see a positive linear trend. – Ricardo Silveira Jun 14 '16 at 11:55
Your answer is: it makes 'almost' no difference.
Q: Why almost?
A: Strings, depending on their size, can be more time-expensive than integers, because you need the interpreter to map the string to a number, meanwhile the integer is a number itself already. But it may vary depending on the size of the string.
But it will make almost no difference to your case. As results were shown in the answer provided by Basili Syrakis.
What you need to understand is that dictionaries are based on hash tables, therefore it will cost asymptotically O(1) to return a value for a specified key. Then the type should not be a big difference for your case.

- 1,193
- 1
- 8
- 16