0

I am asked to compare 2 list and filer out unique value in a third list. I must ensure that the third list only contains unique values, no doubles.

Following code works :

import os, random

def makerange(number):
    lijst = [random.randint(1,number) for item in
    range(1,random.randint(2,number))]
    return lijst

a = makerange(20)
b = makerange(20)
c = set()

for item in a:
  if item in b and item not in c:
      c.add(item)

I've tried to rewrite the for loop to a python list comprehension.

c = [ item for item in a if (item in b) & (item not in c)]

However this list comprehension does not work ? Any suggestions why this is not working ? And how should i write this with a list comprehension.

H Doucet
  • 77
  • 7

4 Answers4

5

Your code can't work because you can't refer to c, because it won't exist until the comprehension has completed.

You also used the wrong operator; you need to use and, not &; the latter is a bitwise operation, not a boolean logic AND. It happens to give you the same results here, but that's just luck.

The following works, using a set comprehension to produce unique values:

c = {item for item in a if item in b}

or, if you must use a list comprehension, use a separate set to track what values you already processed; trick taken from How do you remove duplicates from a list in whilst preserving order?:

seen = set()
c = [item for item in a if item in b and not (item in seen or seen.add(item))]

but you may as well use a set operation, in your case intersection:

c = set(a).intersection(b)

or using the & operator, which is overloaded for sets to produce an intersection too:

c = set(a) & set(b)

Demo (with non-random values to make it easier to reproduce):

>>> a = [17, 8, 19, 17, 17, 4, 8, 17, 6, 19, 18, 11, 15, 8]
>>> b = [8, 9, 16, 7, 16, 14, 3, 19, 1, 17, 8, 11]
>>> {item for item in a if item in b and item not in c}
set([8, 17, 19, 11])
>>> set(a) & set(b)
set([8, 17, 11, 19])
>>> seen = set()
>>> [item for item in a if item in b and not (item in seen or seen.add(item))]
[17, 8, 19, 11]
Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Technically, since both arguments to `&` are `bool`, and `bool` values are just subclasses of `int` with numeric value 1 (`True`) and 0 (`False`), the code is accidentally correct even though it uses `&` when it should use `and`. It won't short-circuit properly, but it should work; it's not going to get different results as a result of the mistake. – ShadowRanger Feb 22 '16 at 22:05
  • @ShadowRanger: Yes, the truthy value happens to be the same *in this case*. But best to point out the correct way, lest they try to use it without boolean operators on both sides. – Martijn Pieters Feb 22 '16 at 22:07
  • Agreed. Pointing it out as bad code is a good thing. It's just not "a reason it doesn't work". It doesn't work because it's trying to use `c` before `c` is bound correctly. – ShadowRanger Feb 22 '16 at 22:11
  • Reworked the ordering there; you are right it can't rightfully be listed as a reason. – Martijn Pieters Feb 22 '16 at 22:14
2

You can use intersection for set instead!

c = set(a).intersection(b)
olofom
  • 6,233
  • 11
  • 37
  • 50
0

This does one version of the job for you. Turn a and b into sets and take their intersection. Sets, by definition, contain no duplicates. The original doesn't work for the reason several others pointed out: & is the wrong operator in Python.

a = makerange(20)
b = makerange(20)
c = set(set(a).intersection(set(b)))

Output:

set([20])
Prune
  • 76,765
  • 14
  • 60
  • 81
  • The original working code produces an *intersection*, not a union. You also haven't addressed why the original code doesn't work. – Martijn Pieters Feb 22 '16 at 22:03
  • The `&` operator is indeed wrong, but it *happens to work* when using it on two boolean values (as they do here). The reason the code fails has far more to do with the fact that `c` is not defined until the list comp completes. – Martijn Pieters Feb 22 '16 at 22:09
0

It seems to me that you only want to add items to C if and only if they are in both A and B.

import os, random


def makerange(number):
    lijst = [random.randint(1, number) for _ in range(1, random.randint(2, number))]
    return lijst

a, b = makerange(20), makerange(20)
c = {item for item in a if item in b} # Set comprehension
print(c)

You can also use an intersection of the set:

c = list(set(a).intersection(set(b)))
Goodies
  • 4,439
  • 3
  • 31
  • 57