How can I find the missing value more concisely?

Question

The following code checks if x and y are distinct values (the variables x, y, z can only have values a, b, or c) and if so, sets z to the third character:

if x == 'a' and y == 'b' or x == 'b' and y == 'a':
    z = 'c'
elif x == 'b' and y == 'c' or x == 'c' and y == 'b':
    z = 'a'
elif x == 'a' and y == 'c' or x == 'c' and y == 'a':
    z = 'b'

Is is possible to do this in a more, concise, readable and efficient way?

The short answer is "Yes!" Python's sets are excellent for checking distinctness and for computing unused elements. — Raymond Hettinger, Jan 09 '12 at 19:22
Thanks all for the answers ,I guess i'll use the solution using set as its reasonably fast and readable,The lookup table based answer by Óscar López is also intresting. — Bunny Rabbit, Jan 11 '12 at 08:46

Sven Marnach · Accepted Answer · 2012-01-09T20:36:48.283

62

z = (set(("a", "b", "c")) - set((x, y))).pop()

I am assuming that one of the three cases in your code holds. If this is the case, the set set(("a", "b", "c")) - set((x, y)) will consist of a single element, which is returned by pop().

Edit: As suggested by Raymond Hettinger in the comments, you could also use tuple unpacking to extract the single element from the set:

z, = set(("a", "b", "c")) - set((x, y))

edited Jan 09 '12 at 20:36

answered Jan 09 '12 at 17:23

Sven Marnach

574,206
118
941
841

26

If you're using Python 2.7/3.1 or later, you can write it even more concisely using set literals, like so: `z = ({'a', 'b', 'c'} - {x, y}).pop()` – Taymon Jan 09 '12 at 17:43
7

The *pop()* is unnecessary and slow. Use tuple unpacking instead. Also, the ``set(("a", "b", "c"))`` is invariant so it can be precomputed one time, leaving only the set differencing to be used in a loop (if it isn't used in a loop, then we don't care as much about speed). – Raymond Hettinger Jan 09 '12 at 19:19
OP Bunny made it clear, by "checks if distinct values" and "if so", that you can't count on x and y being different. – Ed Staub Jan 09 '12 at 21:43
3

@Ed: I know, but the OP did not specify what to do when `x == y`, so I omitted the test. It's easy enough to add `if x != y:` if needed. – Sven Marnach Jan 09 '12 at 21:50
4

Personally, I would opt for the first one as it's more obvious than a random comma. – John Jan 09 '12 at 22:24
@EdStaub: I might actually have strengthened the question slightly since I added the "if so" part in my first edit. – Martin Geisler Jan 10 '12 at 09:01
1

You may want to replace `set(("a", "b", "c"))` by `set("abc")`. – kasyc Jan 16 '12 at 09:54

score 47 · Answer 2 · answered Jan 09 '12 at 19:15

47

The strip method is another option that runs quickly for me:

z = 'abc'.strip(x+y) if x!=y else None

answered Jan 09 '12 at 19:15

chepner

497,756
71
530
681

2

+1 It's also very transparent, and unlike most of the answers, it deals with x==y. – Ed Staub Jan 09 '12 at 21:50
1

Nice idea, +1; though I actually think that `"a"`, `"b"` and `"c"` in the original post are just placeholders for the real values. This solution does not generalise to any other value type than strings of length 1. – Sven Marnach Jan 10 '12 at 23:33
@chepner thats creative! Thanks for answering chepner. – Bunny Rabbit Jan 11 '12 at 08:32

Raymond Hettinger · Answer 3 · 2012-01-10T20:26:08.433

Sven's excellent code did just a little too much work and chould have used tuple unpacking instead of pop(). Also, it could have added a guard if x != y to check for x and y being distinct. Here is what the improved answer looks like:

# create the set just once
choices = {'a', 'b', 'c'}

x = 'a'
y = 'b'

# the main code can be used in a loop
if x != y:
    z, = choices - {x, y}

Here are the comparative timings with a timing suite to show the relative performance:

import timeit, itertools

setup_template = '''
x = %r
y = %r
choices = {'a', 'b', 'c'}
'''

new_version = '''
if x != y:
    z, = choices - {x, y}
'''

original_version = '''
if x == 'a' and y == 'b' or x == 'b' and y == 'a':
    z = 'c'
elif x == 'b' and y == 'c' or x == 'c' and y == 'b':
    z = 'a'
elif x == 'a' and y == 'c' or x == 'c' and y == 'a':
    z = 'b'
'''

for x, y in itertools.product('abc', repeat=2):
    print '\nTesting with x=%r and y=%r' % (x, y)
    setup = setup_template % (x, y)
    for stmt, name in zip([original_version, new_version], ['if', 'set']):
        print min(timeit.Timer(stmt, setup).repeat(7, 100000)),
        print '\t%s_version' % name

Here are the results of the timings:

Testing with x='a' and y='a'
0.0410830974579     original_version
0.00535297393799    new_version

Testing with x='a' and y='b'
0.0112571716309     original_version
0.0524711608887     new_version

Testing with x='a' and y='c'
0.0383319854736     original_version
0.048309803009      new_version

Testing with x='b' and y='a'
0.0175108909607     original_version
0.0508949756622     new_version

Testing with x='b' and y='b'
0.0386209487915     original_version
0.00529098510742    new_version

Testing with x='b' and y='c'
0.0259420871735     original_version
0.0472128391266     new_version

Testing with x='c' and y='a'
0.0423510074615     original_version
0.0481910705566     new_version

Testing with x='c' and y='b'
0.0295209884644     original_version
0.0478219985962     new_version

Testing with x='c' and y='c'
0.0383579730988     original_version
0.00530385971069    new_version

These timings show that the original-version performance varies quite a bit depending on which if-statements are triggered by the various the input values.

Your test seems biased. The so-called "set_version" is only sometimes faster because it is guarded by an additional `if` statement. — ekhumoro, Jan 09 '12 at 19:42
@ekhumoro That was what the problem specification called for: "checks if *x* and *y* are distinct values and if so, sets *z* to the third character". The fastest (and most straight-forward) way to check whether values are distinct is ``x != y``. Only when they are distinct do we the set-difference to determine the third-character :-) — Raymond Hettinger, Jan 09 '12 at 19:48
The point I was making is that your tests don't show that the `set_version` performs better _because it is based on sets_; it only performs better because of the guarding `if` statement. — ekhumoro, Jan 09 '12 at 20:02
@ekhumoro That's a weird reading of the test results. The code **does** what the OP asked for. The timings show the comparative performance with all possible groups of inputs. It's up to you how you want to interpret those. The timings for the version using ``if x != y: z, = choices - {x, y}`` fares reasonably well compared to the OP's original code. I don't know where your notion of bias comes in -- the timings are what they are, and AFAICT, this is still the best of the answers that has been posted. It is both clean and fast. — Raymond Hettinger, Jan 09 '12 at 23:15
Several optimisations were added to Sven's "set-version", but the same was not done for the "if-version". Adding an `if x != y` guard to the "if-version" would probably make it more consistent and better performing than all of the other solutions that have been offered so far (although obviously not as readable and concise). Your "set_version" is a very good solution - it's just not _quite_ as good as the tests make it seem ;-) — ekhumoro, Jan 09 '12 at 23:42
@All I already had a if x != y in my code , I think i should have included that in the question — Bunny Rabbit, Jan 11 '12 at 08:37

score 18 · Answer 4 · answered Jan 09 '12 at 17:24

z = (set('abc') - set(x + y)).pop()

Here are all of the scenarios to show that it works:

>>> (set('abc') - set('ab')).pop()   # x is a/b and y is b/a
'c'
>>> (set('abc') - set('bc')).pop()   # x is b/c and y is c/b
'a'
>>> (set('abc') - set('ac')).pop()   # x is a/c and y is c/a
'b'

Sven Marnach · Answer 5 · 2012-01-09T18:07:36.517

15

If the three items in question weren't "a", "b" and "c", but rather 1, 2 and 3, you could also use a binary XOR:

z = x ^ y

More generally, if you want to set z to the remaining one of three numbers a, b and c given two numbers x and y from this set, you can use

z = x ^ y ^ a ^ b ^ c

Of course you can precompute a ^ b ^ c if the numbers are fixed.

This approach can also be used with the original letters:

z = chr(ord(x) ^ ord(y) ^ 96)

Example:

>>> chr(ord("a") ^ ord("c") ^ 96)
'b'

Don't expect anyone reading this code to immediately figure out what it means :)

edited Jan 09 '12 at 18:07

answered Jan 09 '12 at 18:02

Sven Marnach

574,206
118
941
841

+1 This solution seems nice and elegant; and if you make it its own function and give the magic number 96 a name the logic is pretty easy to follow/maintain (`xor_of_a_b_c = 96 # ord('a') ^ ord('b') ^ ord('c') == 96`). However, in terms of raw speed this is about 33% slower than the long chain of `if / elif`s; but 500% faster than the `set` method. – dr jimbob Jan 09 '12 at 19:52
@sven Thanks for introducing the XOR operator to me,Your solution is clean and elegant, I think this example will make it stick to my brain,Thanks again :) – Bunny Rabbit Jan 11 '12 at 08:41

Martin Geisler · Answer 6 · 2012-01-10T08:56:19.933

13

I think the solution by Sven Marnach and F.J is beautiful, but it's not faster in my little test. This is Raymond's optimized version using a pre-computed set:

$ python -m timeit -s "choices = set('abc')" \
                   -s "x = 'c'" \
                   -s "y = 'a'" \
                      "z, = choices - set(x + y)"
1000000 loops, best of 3: 0.689 usec per loop

This is the original solution:

$ python -m timeit -s "x = 'c'" \
                   -s "y = 'a'" \
                      "if x == 'a' and y == 'b' or x == 'b' and y == 'a':" \
                      "    z = 'c'" \
                      "elif x == 'b' and y == 'c' or x == 'c' and y == 'b':" \
                      "    z = 'a'" \
                      "elif x == 'a' and y == 'c' or x == 'c' and y == 'a':" \
                      "    z = 'b'"
10000000 loops, best of 3: 0.310 usec per loop

Note that this is the worst possible input for the if-statements since all six comparisons will have to be tried out. Testing with all values for x and y gives:

x = 'a', y = 'b': 0.084 usec per loop
x = 'a', y = 'c': 0.254 usec per loop
x = 'b', y = 'a': 0.133 usec per loop
x = 'b', y = 'c': 0.186 usec per loop
x = 'c', y = 'a': 0.310 usec per loop
x = 'c', y = 'b': 0.204 usec per loop

The set-based variant shows the same performance for different inputs, but it is consistently between 2 and 8 times slower. The reason is that the if-based variant runs much simpler code: equality tests compared to hashing.

I think both types of solutions are valuable: it's important to know that creating "complicated" data structures like sets cost you something in performance — while they give you a lot in readability and development speed. The complex data types are also much better when the code change: it's easy to extend the set-based solution to four, five, ... variables whereas the if-statements quickly turn into a maintenance nightmare.

edited Jan 10 '12 at 08:56

answered Jan 09 '12 at 17:39

Martin Geisler

72,968
25
171
229

1

@martinGeisler thanks a lot for your reply i'd had no idea that we can time things like this in python.I have a gut feeling that Chessmasters solution would just work fine and efficient, I'll try to test it like you did with other answers and let you know. – Bunny Rabbit Jan 09 '12 at 17:46
agreed that readability counts , but i am just trying to solve a programming challenge that might land me job somewhere , so for now its efficiency first readability second me for now :P – Bunny Rabbit Jan 09 '12 at 17:49
1

The set-based solution optimize *conciseness* and *readability* (and *elegance*). But *efficiency* was also mentioned, so I went and investigated the performance of the proposed solutions. – Martin Geisler Jan 09 '12 at 17:53
1

@MartinGeisler: Yes, when I noticed this, I removed my comment. And I usually do find it interesting at least to know what is faster. – Sven Marnach Jan 09 '12 at 17:55
1

@BunnyRabbit: the timeit module is great for micro-benchmarks like this. You should of course *profile your overall program first* to determine where the bottlenecks are, but when they're identified, then timeit can be a great way to quickly try different implementations against each other. – Martin Geisler Jan 09 '12 at 17:55
1

+1 -- benchmark test that proofs simple series of compares is logical and fast. – Jeff Ferland Jan 09 '12 at 18:19
I can't imagine the performance difference would make any noticeable difference, unless this is being called in a tight-loop. – BlueRaja - Danny Pflughoeft Jan 09 '12 at 19:46

ChessMaster · Answer 7 · 2012-01-09T19:06:37.817

8

z = 'a'*('a' not in x+y) or 'b'*('b' not in x+y) or 'c'

or less hackish and using Conditional Assignment

z = 'a' if ('a' not in x+y) else 'b' if ('b' not in x+y) else 'c'

but probably the dict solution is faster... you'd have to time it.

edited Jan 09 '12 at 19:06

answered Jan 09 '12 at 17:44

ChessMaster

529
1
4
12

Óscar López · Answer 8 · 2012-01-09T18:07:27.840

8

Try this option, using dictionaries:

z = {'ab':'c', 'ba':'c', 'bc':'a', 'cb':'a', 'ac':'b', 'ca':'b'}[x+y]

Of course, if the x+y key is not present in the map, it'll produce a KeyError which you'll have to handle.

If the dictionary is precomputed a single time and stored for future use, the access will be much faster, since no new data structures will have to be created for each evaluation, only a string concatenation and a dictionary lookup are needed:

lookup_table = {'ab':'c', 'ba':'c', 'bc':'a', 'cb':'a', 'ac':'b', 'ca':'b'}
z = lookup_table[x+y]

edited Jan 09 '12 at 18:07

answered Jan 09 '12 at 17:45

Óscar López

232,561
37
312
386

2

Just for fun, here is another dict option: `{1: 'c', 2: 'b', 3: 'a'}[ord(x)+ord(y)-ord('a')*2]`, extra complexity probably isn't worth the saved space though. – Andrew Clark Jan 09 '12 at 18:09
2

@F.J: `z = {1: 'a', 2: 'b', 3: 'c'}[2*('a' in x+y)+('b' in x+y)]` this is fun... – ChessMaster Jan 09 '12 at 18:55
Is it faster than the OP original code? if so, why? How can calculation of hash values be faster than simple comparison? – max Mar 23 '12 at 19:46
@max it a single hash calculation, not a whole bunch of comparisons and conditional expressions – Óscar López Mar 23 '12 at 20:13
Cool.. Didn't realize how fast the hash function is! – max Mar 23 '12 at 20:45

selfnamed · Answer 9 · 2012-01-09T17:48:56.897

2

I think it should looks like that:

z = (set(("a", "b", "c")) - set((x, y))).pop() if x != y else None

edited Jan 09 '12 at 17:48

answered Jan 09 '12 at 17:38

selfnamed

176
6

12

`len(set((x, y))) == 2` is the most unreadable way to write `x != y` I've ever seen :) – Sven Marnach Jan 09 '12 at 17:40
Yes, Sven))) Thanks about your comment. This script had an another basic idea when I started write it)) An finally I forgot to edit that. – selfnamed Jan 09 '12 at 17:45

john · Answer 10 · 2012-01-26T09:27:21.400

1

Using list comprehension, assuming like others that one of the three cases in your code holds:

l = ['a', 'b', 'c']
z = [n for n in l if n not in [x,y]].pop()

Or, like in the accepted answer, taking advantage of the tuple to unpack it,

z, = [n for n in l if n not in [x,y]]

edited Jan 26 '12 at 09:27

answered Jan 09 '12 at 18:57

john

3,043
5
27
48

score 0 · Answer 11 · answered Jan 10 '12 at 00:19

0

See if this works

if a not in xy
    z= 'a'
if b not in xy
    z='b'
if c not in xy
    z='c'

answered Jan 10 '12 at 00:19

Sundhar

149
1
7
21

How can I find the missing value more concisely?

11 Answers11

Linked

Related