0

Consider:

categories = {'foo':[4], 'mer':[2, 9, 0]}

key = 'bar'
value = 5

We could safely append to a list stored in a dictionary in either of the following ways:

  1. Being cautious, we always check whether the list exists before appending to it.

    if not somedict.has_key(key):
        somedict[key] = []
    somedict[key].append(value)
    
  2. Being direct, we simply clean up if there is an exception.

    try:
        somedict[key].append(value)
    except KeyError:
        somedict[key] = [value]
    

In both cases, the result could be:

{'foo':[4], 'mer':[2, 9, 0], 'bar':[5]}

To restate my question: In simple instances like this, is it better (in terms of style, efficiency, & philosophy) to be cautious or direct?

f06
  • 1
  • A rule of thumb is that exceptions shouldn't be part of the normal program flow. They should represent (surprise) exceptional cases. In this example, I assume that somedict.has_key(key) is a valid state, thus not an exception. (this case is quite simple, but often this is a question of personal taste). – abesto Jan 30 '11 at 22:23
  • 2
    @abesto: That's incorrect; you're applying a guideline for languages like C++ to Python that doesn't apply at all. See my answer at http://stackoverflow.com/questions/4717484#4718382. – Glenn Maynard Jan 30 '11 at 22:43
  • Your question is one of philosophy, dogma, and personal opinions. You will see many heated words and catch phrases. I doubt there will be very much thinking ... – nate c Jan 30 '11 at 22:55
  • haskey is deprecated for sometime now. you should use `key in somedict` – John La Rooy Jan 30 '11 at 22:59
  • @nate: Speak for yourself, please. – Glenn Maynard Jan 30 '11 at 23:00
  • So much arguing and nobody clicked close for being argumentative? Not a real question either because there is no definite answer for all the undefined "simple cases". Anyways, when doing actual programming, this question doesn't ask itself because it's either obvious or there is a better alternative in the libraries (ie `defaultdict` here). – Jochen Ritzel Jan 30 '11 at 23:34
  • @Jochen: Amusingly, the only "arguing" I see is people claiming people are going to argue. This isn't an "argumentative" question in the slightest. – Glenn Maynard Jan 31 '11 at 00:41

7 Answers7

3

What you'll find is that your option 1 "being cautious" is often remarkably slow. Also, it's subject to obscure errors because the test you tried to write to "avoid" the exception is incorrect.

What you'll find is that your option 2 "being direct" is often much faster. It's also more likely to be correct, as well as faster and easier for people to read.

Why? Internally, Python often implements things like "contains" or "has_key" as an exception test.

def has_key( self, some_key ):
    try:
        self[some_key]
    except KeyError:
        return False
    return True

Since this is typically how a has_key type of method is implemented, there's no reason for you code do waste time doing this in addition to what Python will already do.

More fundamentally, there's a correctness issue. Many attempts to prevent or avoid an exception are incomplete are incorrect.

For example, trying to establish if a string is potentially a float-point number is fraught with numerous exceptions and special cases. About the only way to do it correctly is this.

try:
    x= float( some_string )
except ValueError: 
    # not a floating-point value

Just do the algorithm without worrying about "preventing" or "avoiding" exceptions.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 1
    Don't measure first unless performance is actually the issue--this question is about general code style, not inner-loop optimization, after all. People should be writing code, by default, based on what's clean, not based on what's fastest. – Glenn Maynard Jan 30 '11 at 23:12
2

In the general case, EFAP ("easier to ask for forgiveness than permission") is preferred in Python. Of course the rule of thumb "exceptions should be for exceptional cases" still holds (if you expect an exception to occur frequently, you propably should "look before you leap") - i.e. it depends. Efficiency-wise, it shouldn't make too much of a difference in most cases - when it does, consider that try blocks without exceptions are cheap and conditions are always checked.

Note that neither is necessary (at least you don't have to do it yourself/epplicitly) some cases, including this example - here, you should just use collections.defaultdict

1

You don't need a strong, compelling reason to use exceptions--they're not very expensive in Python. Here are some possible reasons to prefer one or the other, for your particular example:

  • The exception version requires a simpler API. Any container that supports item lookup and assignment (__getitem__ and __setitem__) will work. The non-exception version additionally requires that has_key be implemented.
  • The exception version may be slightly faster if the key usually exists, since it only requires a single dict lookup. The has_key version requires at least two--one for has_key and one for the actual lookup.
  • The non-exception version has a more consistent code path: it always puts the value in the array in the same place. By comparison, the exception version has a separate code path for each case.

Unless performance is particularly important (in which case you'd be benchmarking and profiling), none of these are very strong reasons; just use whichever seems more natural.

Glenn Maynard
  • 55,829
  • 10
  • 121
  • 131
1

try is fast enough, except (if it happens) may not be. If the average length of those lists is going to be 1.1, use the check-first method. If it's going to be in the thousands, use try/except. If you are really worried, benchmark the alternatives.

Ensure that you are benchmarking the best alternatives. d.has_key(k) is a slow old has_been; you don't need the attribute lookup and the function call. Use k in d instead. Also use else to save a wasted append on the first trip:

Instead of:

if not somedict.has_key(key):
    somedict[key] = []
somedict[key].append(value)

do this:

if key in somedict:
    somedict[key].append(value)
else:
    somedict[key] = [value]
John Machin
  • 81,303
  • 11
  • 141
  • 189
0

You can use setdefault for this specific case:

somedict.setdefault(key, []).append(value)

See here: http://docs.python.org/library/stdtypes.html#mapping-types-dict

sinelaw
  • 16,205
  • 3
  • 49
  • 80
  • I'm less asking for a nifty python solution, and more asking _which is the better habit for simple cases?_ – f06 Jan 30 '11 at 22:23
0

It depends, for exemple if the key is a paramenter of a function that will be used by an other programer, I would use the second approach, because I can't control the input, and the exception information it's actually usefull for a programer. But if its just a process inside a function and the key it's just some input from a database for exemple, the first approach it's better, then if something goes wrong, maybe show the exception information isn't helpfull at all. Use the exception approach if you want to do someting with the exception information.

ramontiveros
  • 309
  • 4
  • 11
-1

EFAP is a good habit to get into for Python.

One reason is that it avoids the race condition if someone wants to use your code in a multithreaded app

John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • @Glenn, please provide a counterexample. – John La Rooy Jan 30 '11 at 23:06
  • 1
    Thread 1 executes `x[y].append(z1)`, fires KeyError and is preempted. Thread 2 executes `x[y].append(z2)`, fires KeyError and is preempted. Thread 1 executes `x[y] = [z1]` in the exception handler. Thread 2 executes `x[y] = [z2]` in the exception handler. One of the values is lost. – Glenn Maynard Jan 30 '11 at 23:20