2

I've read loads of examples but not quite finding what i'm looking for. Tried several ways of doing this but looking for the best.

So the idea is that given:

s1 = ['a','b','c']
s2 = ['a','potato','d']
s3 = ['a','b','h']
strings=[s1,s2,s3]

the results should be:

['c']
['potato','d']
['h']

because these items are unique across the whole list of lists.

Thank you for any suggestions :)

norok2
  • 25,683
  • 4
  • 73
  • 99
Zen42
  • 43
  • 3

5 Answers5

3

As a general approach you can keep a counter of all items and then keep those that have appeared only once.

In [21]: from collections import Counter 

In [23]: counts = Counter(s1 + s2 + s3)                                                                                                                                                                     

In [24]: [i for i in s1 if counts[i] == 1]                                                                                                                                                                  
Out[24]: ['c']

In [25]: [i for i in s2 if counts[i] == 1]                                                                                                                                                                  
Out[25]: ['potato', 'd']

In [26]: [i for i in s3 if counts[i] == 1]                                                                                                                                                                  
Out[26]: ['h']

And if you have a nested list you can do the following:

In [28]: s = [s1, s2, s3]                                                                                                                                                                                   

In [30]: from itertools import chain                                                                                                                                                                        

In [31]: counts = Counter(chain.from_iterable(s))                                                                                                                                                           

In [32]: [[i for i in lst if counts[i] == 1] for lst in s]                                                                                                                                                  
Out[32]: [['c'], ['potato', 'd'], ['h']]
Mazdak
  • 105,000
  • 18
  • 159
  • 188
1

How about:

[i for i in s1 if i not in s2+s3] #gives ['c']
[j for j in s2 if j not in s1+s3] #gives ['potato', 'd']
[k for k in s3 if k not in s1+s2] #gives ['h']

If you want all of them in a list:

uniq = [[i for i in s1 if i not in s2+s3],
[j for j in s2 if j not in s1+s3],
[k for k in s3 if k not in s1+s2]]

#output
[['c'], ['potato', 'd'], ['h']]
some_programmer
  • 3,268
  • 4
  • 24
  • 59
1

To find out the unique elements across the 3 lists you can use the set Symmetric difference(^) operation along with union(|) operation since you have 3 lists.

>>> s1 = ['a','b','c']
>>> s2 = ['a','potato','d']
>>> s3 = ['a','b','h']

>>> (set(s1) | (set(s2)) ^ set(s3)
1

Assuming that you want this to work for an arbitrary number of sequences, a direct (but likely not the most efficient -- probably the others object can be constructed from the last iteration) way to solve this would be:

def deep_unique_set(*seqs):
    for i, seq in enumerate(seqs):
        others = set(x for seq_ in (seqs[:i] + seqs[i + 1:]) for x in seq_)
        yield [x for x in seq if x not in others]

or the slightly faster but less memory efficient and otherwise identical:

def deep_unique_preset(*seqs):
    pile = list(x for seq in seqs for x in seq)
    k = 0
    for seq in seqs:
        num = len(seq)
        others = set(pile[:k] + pile[k + num:])
        yield [x for x in seq if x not in others]
        k += num

Testing it with the provided input:

s1 = ['a', 'b', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']

print(list(deep_unique_set(s1, s2, s3)))
# [['c'], ['potato', 'd'], ['h']]
print(list(deep_unique_preset(s1, s2, s3)))
# [['c'], ['potato', 'd'], ['h']]

Note that if the input contain duplicates within one of the lists, they are not removed, i.e.:

s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']

print(list(deep_unique_set(s1, s2, s3)))
# [['c', 'c'], ['potato', 'd'], ['h']]
print(list(deep_unique_preset(s1, s2, s3)))
# [['c', 'c'], ['potato', 'd'], ['h']]

If all duplicates should be removed, a better approach is to count the values. The method of choice for this is by using collections.Counter, as proposed in @Kasramvd answer:

def deep_unique_counter(*seqs):
    counts = collections.Counter(itertools.chain.from_iterable(seqs))
    for seq in seqs:
        yield [x for x in seq if counts[x] == 1]
s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']
print(list(deep_unique_counter(s1, s2, s3)))
# [[], ['potato', 'd'], ['h']]

Alternatively, one could keep track of repeats, e.g.:

def deep_unique_repeat(*seqs):
    seen = set()
    repeated = set(x for seq in seqs for x in seq if x in seen or seen.add(x))
    for seq in seqs:
        yield [x for x in seq if x not in repeated]

which will have the same behavior as the collections.Counter-based approach:

s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']
print(list(deep_unique_repeat(s1, s2, s3)))
# [[], ['potato', 'd'], ['h']]

but is slightly faster, since it does not need to keep track of unused counts.

Another, highly inefficient, make use of list.count() for counting instead of a global counter:

def deep_unique_count(*seqs):
    pile = list(x for seq in seqs for x in seq)
    for seq in seqs:
        yield [x for x in seq if pile.count(x) == 1]

These last two approaches are also proposed in @AlainT. answer.


Some timings for these are provided below:

n = 100
m = 100
s = tuple([random.randint(0, 10 * n * m) for _ in range(n)] for _ in range(m))
for func in funcs:
    print(func.__name__)
    %timeit list(func(*s))
    print()

# deep_unique_set
# 10 loops, best of 3: 86.2 ms per loop

# deep_unique_preset
# 10 loops, best of 3: 57.3 ms per loop

# deep_unique_count
# 1 loop, best of 3: 1.76 s per loop

# deep_unique_repeat
# 1000 loops, best of 3: 1.87 ms per loop

# deep_unique_counter
# 100 loops, best of 3: 2.32 ms per loop
norok2
  • 25,683
  • 4
  • 73
  • 99
1

Counter (from collections) is the way to go for this:

from collections import Counter

s1 = ['a','b','c']
s2 = ['a','potato','d']
s3 = ['a','b','h']
strings=[s1,s2,s3]

counts  = Counter(s for sList in strings for s in sList)
uniques = [ [s for s in sList if counts[s]==1] for sList in strings ]

print(uniques) # [['c'], ['potato', 'd'], ['h']]

If you're not allowed to use an imported module, you could do it with the list's count() method but it would be much less efficient:

allStrings = [ s for sList in strings for s in sList ]
unique     = [[ s for s in sList if allStrings.count(s)==1] for sList in strings]

This can be made more efficient using a set to identify repeated values:

allStrings = ( s for sList in strings for s in sList )
seen       = set()
repeated   = set( s for s in allStrings if s in seen or seen.add(s))
unique     = [ [ s for s in sList if s not in repeated] for sList in strings ]
Alain T.
  • 40,517
  • 4
  • 31
  • 51