0

Very similar to this question I was wondering what is both more idiomatic and faster when using Cython: A try/except KeyError block or alternative solutions based on pop and/or in.


Test data:

from random import shuffle, randint

datal = list(map(float, range(1_000)))
shuffle(datal)
data = {
    a * 10 ** (randint(0, 2) * 3): b
    for a, b in zip(datal[:-1], datal[1:])
}
data[datal[-1] * 10 ** (randint(0, 2) * 3)] = datal[0]

Three variations in Cython (language level 3):

  1. Mostly dynamic typing, pop and None as a default return value:
def test_none(dict raw):
    _, last_value = raw.popitem()
    cdef list out = [last_value]
    while len(raw) > 0:
        next_value = raw.pop(last_value, None)
        if next_value is not None:
            out.append(next_value)
            last_value = next_value
            continue
        next_value = raw.pop(last_value * 1_000, None)
        if next_value is not None:
            last_value = next_value
            out.append(next_value)
            continue
        next_value = raw.pop(last_value * 1_000_000)
        out.append(next_value)
        last_value = next_value
    return out
  1. With in and pop:
def test_in(dict raw):
    cdef double last_value, next_value, _
    _, last_value = raw.popitem()
    cdef list out = [last_value]
    while len(raw) > 0:
        if last_value in raw:
            next_value = raw.pop(last_value)
        elif last_value * 1_000 in raw:
            next_value = raw.pop(last_value * 1_000)
        else:
            next_value = raw.pop(last_value * 1_000_000)
        out.append(next_value)
        last_value = next_value
    return out
  1. With pop and try/except KeyError:
def test_te(dict raw):
    cdef double last_value, next_value, _
    _, last_value = raw.popitem()
    cdef list out = [last_value]
    while len(raw) > 0:
        try:
            next_value = raw.pop(last_value)
        except KeyError:
            try:
                next_value = raw.pop(last_value * 1_000)
            except KeyError:
                next_value = raw.pop(last_value * 1_000_000)
        out.append(next_value)
        last_value = next_value
    return out

With timeit, I get the following results:

%timeit test_te(data.copy())
471 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit test_in(data.copy())
219 µs ± 9.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit test_none(data.copy())
175 µs ± 401 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
s-m-e
  • 3,433
  • 2
  • 34
  • 71
  • 1
    I do not really understand the question. You answered your own question regarding performance of the different approach. That being said, `test_none` and `test_in` looks fine. The second is likely more idiomatic but people may disagree. Anyway, using exception in the critical path of a hot loop is never a good idea. It is generally slow and not idiomatic here as exception are supposed to be rare and not be the usual case (hence the name). – Jérôme Richard Jun 13 '21 at 17:13
  • @JérômeRichard Thanks. In some sense, I answered it myself. And then I see vastly different behavior in a larger code base. I isolated the relevant code path and implemented all three solutions as above. They're all almost similarly fast with `try`/`except` actually being the fastest (!). I am using tuples of integers as key and objects coming from an extension class as values - those are the differences. Hence I am not sure what is actually best - of specifically, why, under which circumstances. – s-m-e Jun 13 '21 at 17:49
  • 1
    Would it not depend on the the frequency of matches? For example, I'd guess (I definitely haven't measured anything here) that the try/except version would probably be good where failed lookups are very rare, but poor when they're common. – DavidW Jun 13 '21 at 18:22
  • @DavidW Good point. In my example above, all three "branches" are equally distributed. In my larger code base, I am going through them from very likely to very unlikely. So I am not hitting as many `except` blocks as in my little test here (relatively speaking). Makes sense. – s-m-e Jun 13 '21 at 18:35
  • @DavidW indeed, someone documented it: https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Initializing_Dictionary_Elements – Oppen Jun 30 '21 at 01:41

0 Answers0