3

Is there an elegant or pythonic way to exclude entries containing duplicate values when using zip?

As an example:

>>> list1 = [0, 1]
>>> list2 = [0, 2]
>>> zip(list1, list2)
[(0, 0), (1, 2)]

I would like to have just the second element [(1, 2)]. Currently, I do

[x for x in zip(list1, list2) if len(set(x)) == len(x)]

but this feels a bit tedious. Is there a better way to do this?


EDIT : And how do I scale this to the general case, where there are more than two lists?

>>> list1 = [0, 1]
>>> list2 = [0, 2]
>>> list3 = [0, 3]
>>> ...
>>> zip(list1, list2, list3, ...)

If any entry contains any duplicate values, it should be discarded (not every value in the tuple has to be equal).

  • Do they have to be all duplicates to be discarded, or just some duplicate amongst? – wim Apr 26 '13 at 15:09
  • 1
    Just some duplicate values, not all of them have to be equal. Converting to set seems to be the way to go, according to Martijn. –  Apr 26 '13 at 15:13

3 Answers3

7

What about

[(x,y) for (x,y) in zip(list1, list2) if x != y]

General case:

[x for x in zip(list1, list2, ... listn) if not all(z == x[0] for z in x[1:])]

That finds duplicates where every element is equal. If only one pair needs to be equal to count as a duplicate, you can use the set method you already mentioned in your question provided you have hashable types. If you have unhashable types, the (interesting) question of identifying duplicates has been answered previously here.

Community
  • 1
  • 1
wim
  • 338,267
  • 99
  • 616
  • 750
  • If I change that all into an any, I have what I want! –  Apr 26 '13 at 15:25
  • 2
    Be careful, you can't change it into any it's not as simple as that. Because this only compares against the first element (example [0, 1, 1] would be missed). – wim Apr 26 '13 at 15:26
2

You only have 2-value tuples, so you can compare the first value against the second. The list comprehension is the best option:

[x for x in zip(list1, list2) if x[0] != x[1]]

For the general case, provided your values are all hashable, you already have the best option.

If you have non-hashable types you'd need to special-case the 'unique' handling already, so that is outside the scope here.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • The general case may fail with unhashable types. – wim Apr 26 '13 at 15:08
  • @Martijn yes, but there may be for example lists inside the tuples – wim Apr 26 '13 at 15:13
  • @wim: That is why I retracted my comment. :-) – Martijn Pieters Apr 26 '13 at 15:14
  • What do you mean to special-case the 'unique' handling? Lists of integers for example compare by equality normally .. – wim Apr 26 '13 at 15:17
  • @wim: You'd need to compare each element with each other element. For a large list doing a `itertools.product(elem, 2)` loop to compare them all is rather expensive, but casting the list to a `tuple()` instead then use `len(set())` on that would computationally be cheaper. For `dict()` you can use `sorted(d.items())`, etc. The best method depends on the objects, there is no good general solution. – Martijn Pieters Apr 26 '13 at 15:21
1

Here's yet another way using all which, IMHO, expresses the intent of the code more clearly:

[x for x in zip(list1, list2) if not all(x[0] == rest for rest in x)]

This has the advantage that it works for tuples of arbitrary size (not just two elements, you could do zip(list1, list2, list3)), and it uses generator expressions, so it doesn't create additional lists, sets, etc.

Óscar López
  • 232,561
  • 37
  • 312
  • 386