1

In searching for a method to find the duplicate entries in a list in python someone posted a solution that works in python and I cant decipher the ternary logic.

The post was here How do I find the duplicates in a list and create another list with them?

The code is:

seen = set()
dupes = [x for x in list if x in seen or seen.add(x)]

The original poster helpful showed what it equates to:

dupes = []
for x in list:
    if x in seen:
        dupes.append(x)
    else:
        seen.add(x)

I cant find any reference googling around to a ternary of the form if x .... or ...

Can someone help me break down all the logic here?

I roughly understand how dupes = [x equates to dupes.append(x) when 'if x in seen' is True.

But I cant sort out how the 'or seen.add(x)' equates to calling else: seen.add(x) when x was not found in the seen list and thus adds nothing to the dupes list.

Chris
  • 41
  • 6

3 Answers3

4

This is not a "ternary form". Transliterated, this is simply:

seen = set()
dupes = []
for x in list:
    if x in seen or seen.add(x):
        dupes.append(x)

So, a list comprehension is of the form:

[<mapping expression> for <target> in <iterable expression> if <conditional expression>]

Here, x in seen or seen.add(x) is simply a conditional expression, which is equivalent to:

(x in seen) or seen.add(x)

This is relies on the semantics of boolean operations, in this case, for:

x or y

Will evaluate to x if x is truthy, else, it will evaluate to y. In the construction above, if x in seen is true, seen.add(x) is never evaluated. And the condition is true, and x is added to the list, otherwise, seen.add(x) is evaluated, but that expression returns None, so the whole conditional is false, and x is not added to the list.

This is the sort of list comprehension that would never pass a sane code review, if only for the fact that it relies on a side effect, which makes it confusing (list comprehensions should express functional, mapping/filtering operations, if you have side effects, just use a for-loop)

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • Thank you for the detailed breakdown. I am fairly new to python and "would not pass a code review" is reason enough for me to not spend the time trying to make sure I learn how this works to be able to reuse a form like this to solve a problem later. Ill stick with more clear/straight forward ways. – Chris Apr 25 '22 at 01:11
0

This is a pretty awful way to find the duplicates in the list, as that original answer says, but let's break it down:

We have the following clause:

if x in seen or seen.add(x)

For each element, we check whether x is in seen. If so, it's a duplicate. We stop evaluating the expression and add it to the list.

If x is not in seen, it's not a duplicate. We add it to our seen set. .add() returns None, which is a falsy value. So, the element isn't added to our duplicates list, since at this point we have an expression which is equivalent to False or None, which evaluates to False.

BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
0

Consider a set:

s = set()
print(s.add(1))
# None
print(bool(s.add(2)))
# False

For if x in seen or seen.add(x) in your question, if statement first judges x in seen. If it gets true, it adds x to the list. Otherwise, due to the existence of or, it will continue to judge bool(seen.add(x)), the method add will be called and get False, so x will not be added to the list.

Mechanic Pig
  • 6,756
  • 3
  • 10
  • 31