An efficient Pythonic way of mapping values?

Question

I have some values to remap based on there being two ways to specify a rule. The simple if / else method appears the more efficient variant, but I wonder if there is an equally as efficient but more pythonic approach?

 if mod == "I": mod = "+"
 elif mod == "E": mod = "-"
 elif mod == "D": mod = ":"
 elif mod == "M": mod = "."

Not so efficient mapping approach:

 mod = { "I":"+", "E":"-", "D":":", "M":"." }.get(mod, mod)

Efficient in what sense? What is inefficient about the approach you came up with (which is quite Pythonic)? — Thijs van Dien, Oct 18 '13 at 20:06
I agree with Thijs: if you store the dict as a separate reference, you only need construct it once and use it from that point on. — bedwyr, Oct 18 '13 at 20:06
Well, with the hash, there is stack modification through the `get` function call, then another internal call to the hashing algorithm to hash the one character I pass in, before evaluating if the key matches anything in the table and returning the matched entry or the default I passed in. With the `if` and `else` approach, it's a simple string comparison that compares one byte each time. But it just looks a bit amateurish. Nothing like a good switch statement to clean things up. — Craig, Oct 18 '13 at 20:11
I smell premature optimization. Python's `dict` is crazy fast. If the things you mention really matter (measure!), Python might not be the tool for the job. — Thijs van Dien, Oct 18 '13 at 20:13
I did measure: `if_else took 12474 ms for 100000 operations, translation_table took 81650 ms for 100000 operations, dict_lookup took 66385 ms for 100000 operations` — Craig, Oct 18 '13 at 21:00
This is not a great way of doing timing on Python. Have a look at `timeit`. I found that `dict` and `if_else` perform about the same, depending on the amount of "luck" `if_else` has for its input. The fastest thing I could find so far is: `translate_mod = {"I":"+", "E":"-", "D":":", "M":"." }.get` which is then callable. It consistently performs about 15% better than `if_else`. The reasons it's faster, is that you have one lookup (the dot) less, and of course that the `dict` is created only once. — Thijs van Dien, Oct 18 '13 at 22:00
You're right, the best was to time something is using `time`. It will tell you how much CPU time something costs. Profiling using language specific tools is always bad, since the language will almost certainly optimise the profiling tool itself. It's better just to ask the kernel how long something took. — Craig, Oct 18 '13 at 22:08

Demian Brecht · Answer 1 · 2013-10-18T20:23:49.447

2

Mapping will result in an O(1) lookup. Conditionals will result in O(N). Of course, there's the additional function call in your map implementation to take into account as well if you want to get all nit-picky about it.

Rather than theorizing, I ran a quick benchmark. The results (removing the get call and strictly using an array accessor, the results:

   2 function calls in 0.025 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.025    0.025    0.025    0.025 {range}


   4 function calls in 0.035 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.035    0.018    0.035    0.018 {range}

Now admittedly, this only uses the cases given. My assumption is that if runtime of the conditionals will increase linearly with the number of cases, while the map lookup will remain constant.

edited Oct 18 '13 at 20:23

answered Oct 18 '13 at 20:09

Demian Brecht

21,135
5
42
46

1

How would conditionals result in anything *other* than `O(N)`? Why do you say *worst case*? – kojiro Oct 18 '13 at 20:13
I know, the map solution was just an example of what I consider to be something more pythonic. I wasn't actually asking which of the two solutions I provided is the best choice. – Craig Oct 18 '13 at 20:13
@kojiro: Guess that's a little redundant, fixed. – Demian Brecht Oct 18 '13 at 20:16
@kojiro The best case scenario for minor if-else's is an O(0). You try the first thing and are correct. – cwallenpoole Oct 18 '13 at 20:17
@cwallenpoole: Arguably however, O-notation should reflect the worst case complexity of the algorithm. Therefore, O(N) is the running time of the conditional version. – Demian Brecht Oct 18 '13 at 20:24
@DemianBrecht SOrry, I meant Omega, not O. – cwallenpoole Oct 19 '13 at 14:09
@cwallenpoole You mean Ω(1). Ω(0) is for code that doesn't run. – kojiro Oct 20 '13 at 01:19

score 1 · Answer 2 · answered Oct 18 '13 at 20:12

If you have a fixed and small number of cases, the if-elif-else approach is the fastest (and I can't see why it would be un-Pythonic); for a sufficiently large number of cases, a dict lookup will be better.

If the set of cases is dynamic, the dict approach is of course the only viable one.

Also, a third, non-orthodox approach, would be to use macro metaprogramming. This is not possible with vanilla python, however there are libraries e.g. https://github.com/lihaoyi/macropy that allow you to do this in an (arguably) clean manner (most probably not approved by the Python community or Guido).

P.S. on second thought, the macro approach probably wouldn't work in Python as it would in Lisp in the case of most macro implementations for Python, which try to stick to the syntax of vanilla Python; i.e. generating an if-elif-else block from within a macro wouldn't be possible.

score 1 · Answer 3 · answered Oct 18 '13 at 20:41

Keep in mind that a multi-level if statement generates a lot of byte code and is primarily evaluated at the Python level, while a dictionary access happens in the optimized C code of the interpreter. Also, the if statement requires an average of n/2 comparisons if there are n possibilities: it has to check each possibility in order.

The moral of the story: dicts are probably fast enough, but when in doubt, profile to find your real bottleneck, not the one you suspect.

Compare:

def f(mod):
    if mod == "I": return "+"
    elif mod == "E": return "-"
    elif mod == "D": return ":"
    elif mod == "M": return "."

dis.dis(f)
  4           0 LOAD_FAST                0 (mod)
              3 LOAD_CONST               1 ('I')
              6 COMPARE_OP               2 (==)
              9 POP_JUMP_IF_FALSE       16
             12 LOAD_CONST               2 ('+')
             15 RETURN_VALUE

  5     >>   16 LOAD_FAST                0 (mod)
             19 LOAD_CONST               3 ('E')
             22 COMPARE_OP               2 (==)
             25 POP_JUMP_IF_FALSE       32
             28 LOAD_CONST               4 ('-')
             31 RETURN_VALUE

  6     >>   32 LOAD_FAST                0 (mod)
             35 LOAD_CONST               5 ('D')
             38 COMPARE_OP               2 (==)
             41 POP_JUMP_IF_FALSE       48
             44 LOAD_CONST               6 (':')
             47 RETURN_VALUE

  7     >>   48 LOAD_FAST                0 (mod)
             51 LOAD_CONST               7 ('M')
             54 COMPARE_OP               2 (==)
             57 POP_JUMP_IF_FALSE       64
             60 LOAD_CONST               8 ('.')
             63 RETURN_VALUE
        >>   64 LOAD_CONST               0 (None)
             67 RETURN_VALUE


d={"I": "+", "E": "-", "D": ":", "M": "."}
def g(mod):
    return d.get(mod, mod)

12           0 LOAD_GLOBAL              0 (d)
             3 LOAD_ATTR                1 (get)
             6 LOAD_FAST                0 (mod)
             9 LOAD_FAST                0 (mod)
            12 CALL_FUNCTION            2
            15 RETURN_VALUE

score 0 · Answer 4 · answered Oct 18 '13 at 20:18

0

The string methods have a translation function available. You have to build a translation table of 256 characters long. Here is the code snippet I used:

translationTable = ' '*256
translationTable = translationTable[:68]+':'+translationTable[69:] # D to :
translationTable = translationTable[:69]+'-'+translationTable[70:] # E to -
translationTable = translationTable[:73]+'+'+translationTable[74:] # I to +
translationTable = translationTable[:77]+'.'+translationTable[78:] # M to .
print 'EIDM'.translate(translationTable)

Output:

-+:.

answered Oct 18 '13 at 20:18

Bruce Peterson

344
3
8

2

`from string import maketrans; translation_table = maketrans('IEDM', '+-:.')` - didn't think of it, but it is a very reasonable solution indeed. Not sure how it works under the hood, though; might be no faster than a `dict`. – Thijs van Dien Oct 18 '13 at 20:22
Let me run some timings and I'll let you know. – Craig Oct 18 '13 at 20:35
`if_else took 12474 ms for 100000 operations, translation_table took 81650 ms for 100000 operations, dict_lookup took 66385 ms for 100000 operations` – Craig Oct 18 '13 at 20:58

score 0 · Answer 5 · answered Oct 18 '13 at 21:03

Not really an answer as such, but a lot of the comments centred on performance, since I did ask. So I ran some performance tests on the answers so far:

from datetime import datetime
from string import maketrans
tr_table = maketrans('IEDM', '+-:.')
dictionary = { "I":"+", "E":"-", "D":":", "M":"." }
if_else_val = "E"

N_OPS = 100000

now = datetime.now

def time(func):
    s = now()
    func()
    print "%s took %d ms for %d operations" % (func.__name__, (now() - s).microseconds, N_OPS)

def translation_table():
    for i in xrange(N_OPS):
        "I".translate(tr_table)
        "E".translate(tr_table)
        "D".translate(tr_table)
        "M".translate(tr_table)

def dict_lookup():
    for i in xrange(N_OPS):
        dictionary.get("I")
        dictionary.get("E")
        dictionary.get("D")
        dictionary.get("M")

def if_else():
    for i in xrange(N_OPS):
        if if_else_val == "I": pass
        elif if_else_val == "E": pass
        elif if_else_val == "D": pass
        elif if_else_val == "M": pass

time(if_else)
time(translation_table)
time(dict_lookup)

Here are the results:

if_else took 12474 ms for 100000 operations
translation_table took 81650 ms for 100000 operations
dict_lookup took 66385 ms for 100000 operations

This is not a great way of doing timing on Python. Have a look at `timeit`. — Thijs van Dien, Oct 18 '13 at 21:25

An efficient Pythonic way of mapping values?

5 Answers5