Is there a short way to check uniqueness of values without using 'if' and multiple 'and's?

Question

I am writing some code and I need to compare some values. The point is that none of the variables should have the same value as another. For example:

a=1
b=2
c=3
if a != b and b != c and a != c:
    #do something

Now, it is easy to see that in a case of code with more variables, the if statement becomes very long and full of ands. Is there a short way to tell Python that no 2 variable values should be the same.

I can't think of any way to do this easily in a 1-liner. But if this is something I needed to do often, I'd write a short function called something like `all_different` which takes an arbitrary number of arguments and tells you if they're all different or not. Then you can just do `if all_different(a, b, c):` — Robin Zigmond, Jun 22 '19 at 14:59
If you make a list `[a, b, c]`, then this is a duplicate of [How do I check if there are duplicates in a flat list?](https://stackoverflow.com/q/1541797/4518341) — wjandrea, Jun 23 '19 at 01:31
What else do you know about the values? I don't believe it's even mathematically possible if equality testing is the only operation allowed. — user541686, Jun 23 '19 at 03:21
Beware trading understandable for clever — creating a set just to throw it away for the purpose of (IMO) abusing its uniqueness properties means when you go look at it in a few months you’ll wonder “why on earth did I make this set?” — thehole, Jun 25 '19 at 16:45

Ignatius · Answer 1 · 2019-06-24T04:18:50.430

48

You can try making sets.

a, b, c = 1, 2, 3
if len({a,b,c}) == 3:
   # Do something

If your variables are kept as a list, it becomes even more simple:

a = [1,2,3,4,4]
if len(set(a)) == len(a):
    # Do something

Here is the official documentation of python sets.

This works only for hashable objects such as integers, as given in the question. For non-hashable objects, see @chepner's more general solution.

This is definitely the way you should go for hashable objects, since it takes O(n) time for the number of objects n. The combinatorial method for non-hashable objects take O(n^2) time.

edited Jun 24 '19 at 04:18

answered Jun 22 '19 at 14:59

Ignatius

2,745
2
20
32

1

For objects that are not hashable but are comparable, an O(log(N)) solution is to use binary autobalanced trees. See this https://stackoverflow.com/questions/2710651/javas-treeset-equivalent-in-python. This is much more efficient than iterating on all pairs of objects which has O(N^2) time complexity. – Ismael EL ATIFI Jun 22 '19 at 18:47
6

@IsmaelELATIFI: I think you dropped a factor of N. You can't even put all the values into a tree in O(log(N)) time. Anyway, sorting would be easier in Python, since Python doesn't come with a TreeSet. – user2357112 Jun 23 '19 at 00:15
1

@user2357112 Yes indeed I meant O(N*log(N)) :) And yes you are right sorting and then removing adjacent duplicates in one pass would have same time complexity and would be easier than finding a TreeSet Python implementation. – Ismael EL ATIFI Jun 23 '19 at 14:57

chepner · Answer 2 · 2019-06-25T21:46:44.590

26

Assuming hashing is not an option, use itertools.combinations and all.

from itertools import combinations

if all(x != y for x, y in combinations([a,b,c], 2)):
    # All values are unique

edited Jun 25 '19 at 21:46

answered Jun 22 '19 at 15:26

chepner

497,756
71
530
681

8

worth noting that this is quadratic in the number of elements, while the hashable solution (if available) is linear – Ant Jun 23 '19 at 18:38
4

Downvote from me for an unnecessarily slow (quadratic as opposed to linear, as noted by @Ant) solution. It'll be fine for single-digit-sized collections, but not much more. The set-based solution is faster, shorter, and requires no libraries. On the upside, this works for unhashable items, and might consume less memory (depending on the implementation of `combinations`). – Aasmund Eldhuset Jun 23 '19 at 19:30
1

@AasmundEldhuset "It'll be fine for single-digit-sized collections, but not much more" is an exaggeration. If you're running it in a tight loop, that could be true, but if you're not it'll be fine for at least several hundred elements; my computer, at least, can run this on 1000 unique elements in about 50ms, more than fast enough if it isn't being executed repeatedly or in time-critical sections of code. – insert_name_here Jun 24 '19 at 05:48
1

If you do have a large set of items AND the likeliness of doubles is high, you're better off not using all. Just have the for loop and break out as soon as you find a double. – Falc Jun 24 '19 at 07:41
2

Doesn't `all` also short-circuit? – Mees de Vries Jun 24 '19 at 09:18
1

Assuming `combinations` is a generator, this won't have to allocate the full cross product. And `all()` should stop as soon as a match is found. So this will only be slow if duplicates are rare. – Barmar Jun 25 '19 at 17:41
Sorry I didn't see until now that you've added "Assuming hashing is not an option". Removing my downvote. – Aasmund Eldhuset Dec 29 '19 at 22:12

MSeifert · Answer 3 · 2019-06-23T13:24:31.187

It depends a bit on the kind of values that you have.

If they are well-behaved and hashable then you can (as others already pointed out) simply use a set to find out how many unique values you have and if that doesn't equal the number of total values you have at least two values that are equal.

def all_distinct(*values):
    return len(set(values)) == len(values)

all_distinct(1, 2, 3)  # True
all_distinct(1, 2, 2)  # False

Hashable values and lazy

In case you really have lots of values and want to abort as soon as one match is found you could also lazily create the set. It's more complicated and probably slower if all values are distinct but it provides short-circuiting in case a duplicate is found:

def all_distinct(*values):
    seen = set()
    seen_add = seen.add
    last_count = 0
    for item in values:
        seen_add(item)
        new_count = len(seen)
        if new_count == last_count:
            return False
        last_count = new_count
    return True

all_distinct(1, 2, 3)  # True
all_distinct(1, 2, 2)  # False

However if the values are not hashable this will not work because set requires hashable values.

Unhashable values

In case you don't have hashable values you could use a plain list to store the already processed values and just check if each new item is already in the list:

def all_distinct(*values):
    seen = []
    for item in values:
        if item in seen:
            return False
        seen.append(item)
    return True

all_distinct(1, 2, 3)  # True
all_distinct(1, 2, 2)  # False

all_distinct([1, 2], [2, 3], [3, 4])  # True
all_distinct([1, 2], [2, 3], [1, 2])  # False

This will be slower because checking if a value is in a list requires to compare it to each item in the list.

A (3rd-party) library solution

In case you don't mind an additional dependency you could also use one of my libraries (available on PyPi and conda-forge) for this task iteration_utilities.all_distinct. This function can handle both hashable and unhashable values (and a mix of these):

from iteration_utilities import all_distinct

all_distinct([1, 2, 3])  # True
all_distinct([1, 2, 2])  # False

all_distinct([[1, 2], [2, 3], [3, 4]])  # True
all_distinct([[1, 2], [2, 3], [1, 2]])  # False

General comments

Note that all of the above mentioned approaches rely on the fact that equality means "not not-equal" which is the case for (almost) all built-in types but doesn't necessarily be the case!

However I want to point out chepners answers which doesn't require hashability of the values and doesn't rely on "equality means not not-equal" by explicitly checking for !=. It's also short-circuiting so it behaves like your original and approach.

Performance

To get a rough idea about the performance I'm using another of my libraries (simple_benchmark)

I used distinct hashable inputs (left) and unhashable inputs (right). For hashable inputs the set-approaches performed best, while for unhashable inputs the list-approaches performed better. The combinations-based approach seemed slowest in both cases:

I also tested the performance in case there are duplicates, for convenience I regarded the case when the first two elements were equal (otherwise the setup was identical to the previous case):

from iteration_utilities import all_distinct
from itertools import combinations
from simple_benchmark import BenchmarkBuilder

# First benchmark
b1 = BenchmarkBuilder()

@b1.add_function()
def all_distinct_set(values):
    return len(set(values)) == len(values)

@b1.add_function()
def all_distinct_set_sc(values):
    seen = set()
    seen_add = seen.add
    last_count = 0
    for item in values:
        seen_add(item)
        new_count = len(seen)
        if new_count == last_count:
            return False
        last_count = new_count
    return True

@b1.add_function()
def all_distinct_list(values):
    seen = []
    for item in values:
        if item in seen:
            return False
        seen.append(item)
    return True

b1.add_function(alias='all_distinct_iu')(all_distinct)

@b1.add_function()
def all_distinct_combinations(values):
    return all(x != y for x, y in combinations(values, 2))

@b1.add_arguments('number of hashable inputs')
def argument_provider():
    for exp in range(1, 12):
        size = 2**exp
        yield size, range(size)

r1 = b1.run()
r1.plot()

# Second benchmark

b2 = BenchmarkBuilder()
b2.add_function(alias='all_distinct_iu')(all_distinct)
b2.add_functions([all_distinct_combinations, all_distinct_list])

@b2.add_arguments('number of unhashable inputs')
def argument_provider():
    for exp in range(1, 12):
        size = 2**exp
        yield size, [[i] for i in range(size)]

r2 = b2.run()
r2.plot()

# Third benchmark
b3 = BenchmarkBuilder()
b3.add_function(alias='all_distinct_iu')(all_distinct)
b3.add_functions([all_distinct_set, all_distinct_set_sc, all_distinct_combinations, all_distinct_list])

@b3.add_arguments('number of hashable inputs')
def argument_provider():
    for exp in range(1, 12):
        size = 2**exp
        yield size, [0, *range(size)]

r3 = b3.run()
r3.plot()

# Fourth benchmark
b4 = BenchmarkBuilder()
b4.add_function(alias='all_distinct_iu')(all_distinct)
b4.add_functions([all_distinct_combinations, all_distinct_list])

@b4.add_arguments('number of hashable inputs')
def argument_provider():
    for exp in range(1, 12):
        size = 2**exp
        yield size, [[0], *[[i] for i in range(size)]]

r4 = b4.run()
r4.plot()

Is comparing the length of the set to check whether the `add`ed element was new really the best (fastest?) way? It seems pretty cumbersome. — Bergi, Jun 24 '19 at 10:52
@Bergi Yes t's cumbersome. But checking for `in` and doing the `add` invokes the hashing and (internal) find-slot-for-hash operation twice. Depending on the value to add and the state of the set it can be significantly faster to just `add` it and check the length. Maybe I'm missing something obvious but at least in my benchmarks the `set.add` + `len(set)` approach was faster than `in set` and `set.add`. — MSeifert, Jun 24 '19 at 10:55
@Bergi yes, apparently there is no easy/fast way to know if an element was added; related questions: https://stackoverflow.com/questions/8298018/add-to-set-returns-a-boolean-in-java-what-about-python and https://stackoverflow.com/questions/27427067/python-how-to-check-if-an-item-was-added-to-a-set-without-2x-hash-lookup — Ralf, Sep 18 '19 at 23:15

Carcigenicate · Answer 4 · 2019-06-22T15:24:51.597

1

The slightly neater way is to stick all the variables in a list, then create a new set from the list. If the list and the set aren't the same length, some of the variables were equal, since sets can't contain duplicates:

vars = [a, b, c]
no_dupes = set(vars)

if len(vars) != len(no_dupes):
    # Some of them had the same value

This assumes the values are hashable; which they are in your example.

edited Jun 22 '19 at 15:24

answered Jun 22 '19 at 15:01

Carcigenicate

43,494
9
68
117

2

This assumes the values are hashable. – chepner Jun 22 '19 at 15:23
@chepner Yep. They're using numbers in their question. Added a note at the bottom though. – Carcigenicate Jun 22 '19 at 15:25

U13-Forward · Answer 5 · 2019-06-24T06:07:24.747

You can use all with list.count as well, it is reasonable, may not be the best, but worth to answer:

>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
True
>>> a, b, c = 1, 2, 1
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
False
>>>

Also this solution works with unhashable objects in the list.

A way that only works with hashable objects in the list:

>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> len({*l}) == len(l)
True
>>>

Actually:

>>> from timeit import timeit
>>> timeit(lambda: {*l}, number=1000000)
0.5163292075532642
>>> timeit(lambda: set(l), number=1000000)
0.7005311807841572
>>>

{*l} is faster than set(l), more info here.