0

I have this list:

list = [{1,2,3,4}, {3,4,5}, {2,6}] 

I want to have this as the output:

{1,6}

So I only want the unique numbers to have an output. This is what I tried, but it does not work:

list = [{1,2,3,4}, {3,4,5}, {2,6}] 
s1 = []
for number in list:
   if number not in s1:
      s1.append(number)
def unique(s1):
   return set.difference(*s1)
print (unique(s1))

My output is:

{1, 2, 3, 4}

I have no clue how to fix this? I am a python beginner so can anyone explain what the answer is and why that should be the solution? Many thanks in advance!

Sachith Wickramaarachchi
  • 5,546
  • 6
  • 39
  • 68
  • 1
    Shouldn't 5 also be in the output? – chepner Mar 10 '19 at 17:04
  • 1
    `for number in list` does not iterate through numbers. After that loop, `s1` is the same as `list`. Also, don't call your variables `list` - that is a built-in name. – zvone Mar 10 '19 at 17:05
  • `set.difference` will not never give as result items which are not in the first set. – zvone Mar 10 '19 at 17:07

5 Answers5

3

Use a Counter object.

>>> from collections import Counter
>>> from itertools import chain
>>> my_list = [{1,2,3,4}, {3,4,5}, {2,6}]
>>> set(k for k,v in Counter(chain.from_iterable(my_list)).items() if v == 1)
set([1, 5, 6])

chain.from_iterable "flattens" my_list. The Counter compiles how many times each element from the flattened list is seen, and the generator expression sends only the keys mapped to a value of 1 to set.


Some of the intermediate values involved:

>>> list(chain.from_iterable(my_list))
[1, 2, 3, 4, 3, 4, 5, 2, 6]
>>> Counter(chain.from_iterable(my_list))
Counter({2: 2, 3: 2, 4: 2, 1: 1, 5: 1, 6: 1})
chepner
  • 497,756
  • 71
  • 530
  • 681
3

I had an earlier answer posted that wasn't right. I don't like to give up, and I wanted to find an answer to this question that only used sets. I'm not saying that this is a better answer than the others, but it does achieve at least my own goal of not bringing any extra packages into the solution:

def unique(data):
    result = set()
    dups = set()
    for s1 in data:
        # accumulate everything we see more than once
        dups = dups | (result & s1)
        # accumulate everything
        result = (result | s1)
    # the result is everything we only saw once, or in other words,
    # everything we saw minus everything we saw more than once
    return result - dups

print(unique([{1,2,3,4}, {3,4,5}, {2,6}]))
print(unique([{1}, {1}, {1}]))

Output:

set([1, 5, 6])
set([])

There's probably a better solution even to using just sets. I always want to try to use the simplest set of tools possible because I think it is the most understandable to do so, and will often turn out to be as efficient as anything else as well.

CryptoFool
  • 21,719
  • 5
  • 26
  • 44
  • This doesn't work for cases such as `[{1}, {1}, {1}]`, since your method will wrongly give a result of `{1}` (instead of an empty set). – ekhumoro Mar 10 '19 at 17:30
  • Wow, you're right! Ok, so I'm an idiot. A persistent idiot though. Solution updated. – CryptoFool Mar 10 '19 at 18:19
  • I think your latest solution is over-complicated. All you need to do is iterate over all the elements and accumulate one set of shared values (`A`) and one set of duplicated values (`B`). This can be done by simply testing whether the current value is in `A`: if it is, add it to `B`; if not, add it to `A`. The final result will then be simply `A ^ B` (i.e. `A.symmetric_difference(B)`). I think that comes close to what the OP was originally trying to achieve. – ekhumoro Mar 10 '19 at 18:54
  • Actually, the result can just be `A - B` (i.e. `A.difference(B)`), since we don't need anything from `B`. – ekhumoro Mar 10 '19 at 19:24
  • @ekhumoro, I don't know about "complicated", but maybe "inefficient". I wanted to avoid any **if** blocks. I wanted all the logic to be with sets. Maybe you can figure out a better algorithm under those constraints. - I don't see how adding a **if** can be seen as a simplification – CryptoFool Mar 10 '19 at 19:28
  • Well, I just gave you a much simpler and more efficient algorithm. It uses one if-statement, one containment test, and one set operation. – ekhumoro Mar 10 '19 at 19:42
  • @ekhumoro, I don't agree that it's conceptually simpler. As I said, it might be more efficient, but I know that it's true that introducing a conditional into a loop can stall CPU parallelization. Often, doing a bit more work without an **if** (like more memory ops vs processor ops) will turn out to be faster. - but this is an interpreted language, so who knows. I still like my answer for being very simple to understand. – CryptoFool Mar 10 '19 at 19:48
  • I'm sorry, but none of what you said there makes any sense at all. Some basic testing using the [dis module](https://docs.python.org/3/library/dis.html#module-dis) and [timeit](https://docs.python.org/3/library/timeit.html#module-timeit) will easily prove that. – ekhumoro Mar 10 '19 at 19:56
  • I've never said my code was the most efficient. I keep saying that it may not be. Maybe pipeline stalls are a bygone thing from my C++ days. I don't claim to know anything about that and Python, as I indicated. I don't really care about subtle efficiency, and until we hear otherwise, maybe the OP doesn't either. - why are you fighting with me? If you have a better solution, post it. - my solution does what was asked for, as I assume do the others. (and I very much appreciate you pointing out that my first solution was crap) – CryptoFool Mar 10 '19 at 20:00
  • The purpose of comments is to help improve answers. I simply pointed out that your solution has some rather obvious redundancies (i.e. unnecessary set operations), and suggested a simpler algorithm. I don't understand why you are being so resistant to constructive criticism. – ekhumoro Mar 10 '19 at 20:06
  • The only thing I've said the whole time with any conviction is that my solution is simple and easy to understand. I simply disagreed with your conjecture that introducing an **if** makes the code simpler. That's my only "resistance". We can agree to disagree. Nothing else you've said do I disagree with. – CryptoFool Mar 10 '19 at 20:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/189776/discussion-between-steve-and-ekhumoro). – CryptoFool Mar 10 '19 at 20:14
0

Not a fancy python solution but what you probably want is to build a dict where the key is a number in the list and the value is the number of times it occurs.

This solution is by no way optimal since we are performing to many iterations:

list = [{1,2,3,4}, {3,4,5}, {2,6}] 
count_hash = {}

for dict in list:
    for number in dict:
        if number in count_hash:
            count_hash[number] += 1
        else:
            count_hash[number] = 1

for number in count_hash:
    if count_hash[number] == 1:
        print(number)
Miguel Machado
  • 341
  • 1
  • 3
  • 6
0

I agree with chepner on using a Counter. Using functools.reduce to flatten the list came from here.

import functools
import operator
from collections import Counter

my_list = [{1,2,3,4}, {3,4,5}, {2,6}]
flat_list = functools.reduce(operator.iconcat, my_list, [])

results = [k for (k, v) in Counter(flat_list).items() if v == 1]

print(results)

# OUTPUT
# [1, 5, 6]
jbiz
  • 394
  • 1
  • 5
0

I didn't see any of the answers given that were like my solution. I hope the symmetric difference of the sets given provides that correct manner (to this question).

>>> L = [{1,2,3,4}, {3,4,5}, {2,6}]
>>> M = L[0]
>>> for s in L[1:]:
    M ^= s

    
>>> M
{1, 5, 6}

Note: The question was about lists but the list he gave was a list of sets.

Chris Charley
  • 6,403
  • 2
  • 24
  • 26