0

I have a dictionary of items. I want to process all items except the ones which keys start with "_".

Is there a performance difference of doing:

if items.startswith("_"):
   continue

<code that should be done for items that keys do not start with "_">

vs.

if not items.startswith("_"):
   <do code that should be done for items that keys do not start with "_">
Muhammad Lukman Low
  • 8,177
  • 11
  • 44
  • 54
  • yep, 2nd would be the best. – Avinash Raj Apr 12 '15 at 16:06
  • 3
    There is probably not a meaningful performance difference. Pick whichever one is most readable (The first choice prevents arrow code, if that is an issue). – Brian Apr 12 '15 at 16:13
  • 5
    Do whatever reads best, unless you've shown it's a bottleneck. – Peter Wood Apr 12 '15 at 16:14
  • 3
    If you're curious you could test it with the [timeit](https://docs.python.org/3/library/timeit.html) module. – wwii Apr 12 '15 at 16:20
  • 1
    Don't ask us, see [_How can you profile a Python script?_](http://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script) – martineau Apr 12 '15 at 17:12
  • Take a look to [premature optimization](http://c2.com/cgi/wiki?PrematureOptimization) ... stop to think about this details. Look your code, maybe there are a lot of point where spend your effort .... optimization is the last one. – Michele d'Amico Apr 13 '15 at 21:13

1 Answers1

1

I came up with a simple test program for this using the timeit module as per the advice of wwii. It's a useless script; all it does is store each key of interest (i.e. the ones that don't start with '_') in a variable, which is overwritten each time.

import timeit

trials = 1000000

setup = """
foo = {'key0': 'val0', '_key1': 'val1', '_key2': 'val2', 'key3': 'val3', 'key4': 'val4'}
"""

run = """
for key in foo:
    if key.startswith('_'):
        continue
    bar = key
"""
t = timeit.Timer(run, setup)
print 'Using ''continue'': %f' % min(t.repeat(3, trials))

run = """
for key in foo:
    if not key.startswith('_'):
        bar = key
"""
t = timeit.Timer(run, setup)
print 'Using ''if not'': %f' % min(t.repeat(3, trials))

This does three tests of running each block 1,000,000 times and returns the minimum execution time. Here are the results:

Using continue: 1.880194
Using if not: 1.767904

These results vary slightly between runs, but the trend is always the same: The second structure takes around 100 ms less than the first for 1,000,000 runs. That means the difference is on the order of 100 ns for each run. I doubt anyone would notice that.

At this point it's really a question of readability. For such a small block of code, it probably doesn't matter either way. Anyone who knows a little Python should be able to tell what both of those mean. Personally, I would choose the second option, but I don't see a problem with either.

Community
  • 1
  • 1
dpwilson
  • 997
  • 9
  • 19