0

I recently asked about the fastest way to create powers of ten, and it turned out that the fastest way is actually a bit of a sneaky workaround, where you create all the possible values first, and then simply look them up whenever you need them.

In the solution, a list was used as the lookup-table, however, I've just learned that dicts should be much faster when it comes to lookup operations (see e.g. also here). But when I tried out a dict as the lookup table, the process was actually slower:

n = 200
 18 ns   18 ns   18 ns  f[n]  # list
 22 ns   22 ns   22 ns  g[n]  # dict

n = -200
 18 ns   18 ns   18 ns  f[n]  # list
 29 ns   29 ns   29 ns  g[n]  # dict

Why is that? Does it have to do with the fact that the keys are integers and not strings? (Also I guess sets cannot be used in this case?)

Here is the code I ran:

from timeit import repeat


solutions = [
    'f[n]  # list',
    'g[n]  # dict',
]

for n in 200, -200:
    print(f'n = {n}')
    setup = f'''
n = {n}
f = [10.0 ** i for i in [*range(309), *range(-323, 0)]]
g = {{i: 10.0 ** i for i in range(-323, 309)}}
'''
    for solution in solutions:
        try:
            ts = sorted(repeat(solution, setup, repeat=50))[:3]
        except OverflowError:
            ts = [None] * 3
        print(
            *('%3d ns ' % (t * 1e3) if t else ' error ' for t in ts), solution
        )
    print()
mapf
  • 1,906
  • 1
  • 14
  • 40
  • 1
    For me, the interesting thing here is that dict becomes significantly slower for the negative value, while the list is unaffected by that. If you add `hash(n)` to the tested expressions, you'll likely see that that gets much slower for the negative value as well. Don't yet know why. – Kelly Bundy Nov 04 '21 at 11:47
  • @KellyBundy I think that is likely due to Python's caching of small integers (in the range of something like -1 to 256, IIRC). Integers outside of this range have to be created as new GC'd objects. It's slightly faster to index keys that are identical rather than merely equal, and I imagine if the indexing is the only operation performed, that difference might actually be fairly significant. – Jasmijn Nov 04 '21 at 11:59
  • @KellyBundy In CPython 3.10, the cached integers are `-5 <= n <= 256` . You can check it by running e.g. `[a for a, b in zip(range(-300, 300), range(-300, 300)) if a is not b]` – Jasmijn Nov 04 '21 at 12:04
  • Yes, that seems to be it. That was my first suspicion, but then a test showed no slowing down for numbers larger than 256, but seems that was because I had only used the default `repeat` value as well or had low CPU share. Closer longer inspection at the borders ... – Kelly Bundy Nov 04 '21 at 12:21
  • ... [shows the effect](https://tio.run/##hVLLasMwELz7K@YSJAW7NHbsPCCnQs89pKeQg9soscCRjaRQQum3uyvZuKQPanaZ1c5otSu5vbqq0dmyNV3HGMP22qrXsoaR9lI7i7dKajw8PcNWpZFQFpU6VeCUp6VTZ2nhCevKl1qKKEIxg7YAivQGk4W3x7K28m9N4W3QzOc9N2Lea3JvW3P5TdJjMvc2SLKBusU0z4P/IyqC96Kx1@wG03wR/Ptkt5jmy@C9arzPNVarO9o8iejuo@homnO4U@Wgzm1jHL1DK0tHVGOgoTR2U1Pqk@TJIkaSiRjDmqaJ6YiV2K8jOhf0dhtYKiEPvC/CWVXaimvBYhyZJvpdf1Dcs5v8XojdOtuH3a1R2vEQ@m/K2SQ7@EkYJuAOU8xkJqCOcJA0EuWlMdQjg@/U@U6dFfFYQX@FQxc/Ev7n0iEpuu4T) there. – Kelly Bundy Nov 04 '21 at 12:21
  • 1
    Hmm, actually only the `hash` function seems affected, [not dict access](https://tio.run/##hVJNb8IwDL33V7wLSoLKBO3KRydOk3begZ0Qh24NNFJJqyRoQojf3jmBMY0JVtmK4/fkZztt965qdDptTdcxxrDYt@qjqGGk3dXO4rOSGs@vb7BVYSSURaU2FTjl6erUVlp4wLrivZYijyKkCbQFrs/BxNtLUVt5mzP29g8n87YwuzuUR29nyugM/T6TLAt@RfpdJ8nGwe@KJdkk@HfXt/SmwU@s6LLRHLPZA0n0Itp@FK1Nsw1bVQ5q2zbG0Uu0snAENQYaSmPZN4XeSD6YxBikIsb5TuPEpDETqzwiYdDrzWGphCz5qQhn5VKvWIw104Qd9PEJpQ8OFOYYHo@Enajz0VCIZZ6uQqnWKO14CP3X56yXln4uhh64Qx8jmQqoNRwkDUh5aQw1zODbdr5tZ0V8qaB/wqqwFdfiT8L/azokRdd9AQ). – Kelly Bundy Nov 04 '21 at 12:43
  • 1
    For dict access it [seems to be due to collisions](https://tio.run/##fVFBbtswELzzFQMUBkVDDmQzAWIBPgW9tj2kJ0EH1aYsAjIpkHRTI/DbHa7kqFYaVAcNuDs7yxl2p9BYIx87d7lwzvF86vS2auGUP7bB46VRBk8/fsI3lVPQHo3eN0hiPR6DPigPavhQ/WqVyBkz2GCVZQxyBeMBSDnBujAl8AUHbTR2ehtumFPcX5k9qdddTIWn@Inw/ePA@IBT4dFdjvX67mGJGYtJMFY7e@gd6gB96KwLMZVOVXGEedseg7ZRbIOCRWXwf9bzdGjcbou1MipbBwNtKKe0N5X31M5pE5Kak9VXc@air3oVjl2s1HSra4vVhK86xzK7yzCfQ4NUNakWi143/srzme3/w3SV2atkIVcyhczWItJpCW0lzrtLoo6Oh6vSF9zp76EvUBw@BqV2yRBV8j6WDjbSa4SbVSZEkctynFd/tqoL@P5bubq1L1@ds@4T9eKbNarEHHLsDalNqPOEz@SO3ppjhiRE/lJJAV0jQLVexbqiDRHJaCCHwYt0tDnKiZuXEZfLGw) – Kelly Bundy Nov 04 '21 at 12:50
  • 1
    @Jasmijn Argh, testing the whole range `for n in range(-323, 309):` with the full dict on my PC (so I can run it longer) does seem affected by that [-5, 256] range again. I get pretty consistent 137 ns for range [-323,-6], then 109 ns for [-5,0] except 183 ns for -1, then 108 ns for [1,256], then 136 ns for [257,308]. – Kelly Bundy Nov 04 '21 at 13:35
  • @KellyBundy Yeah, I think small dictionaries might also have some optimizations that interfere with your earlier results. – Jasmijn Nov 04 '21 at 15:22

1 Answers1

5

collection[key_or_index] is O(1) for both list and dict. What is different is the performance of key_or_value in collection.

It's the difference between "What is the ith power of ten in my list?" versus "Is x a power of ten that is in my list?"

Indexing is slightly faster for list because dictionaries need to calculate the hash of their key, as well as check for collisions.


The confusion is because "lookup" can refer both to the indexing operation as well as checking for presence, depending on the context.


Here is an overview of how to do the equivalent operation in both lists and dictionaries:

List Dictionary
Indexing lst[i] dct[key]
Checking for presence of key/index -len(lst) <= i < len(lst) key in dct
Checking for presence of value value in lst value in dct.values()
Looping over values for value in lst for value in dct.values()
Looping over keys/indexes for i in range(len(lst)) for key in dct
Looping over both for i, value in enumerate(lst) for key, value in dct.items()
Jasmijn
  • 9,370
  • 2
  • 29
  • 43
  • Thanks! That makes a lot of sense. I wasn't aware that the term "lookup" is ambiguous. Unfortunately, I'm getting downvoted for some reason, so I might delete the post... – mapf Nov 04 '21 at 11:37
  • 1
    I hope you don't delete the post, it's a good question, and I don't think you'll be the last person who had this misunderstanding. I hope you also check out the table I've just added, listing how the ways to use lists and dictionaries differ! – Jasmijn Nov 04 '21 at 11:41
  • @mapf don't delete this question, because it would also delete this answer. The table is really useful for beginners too. – PCM Nov 04 '21 at 11:45