Finding repeated values in multiple lists

Question

I am trying to find if any of the sublists in list1 has a repeated value, so i need to be told if a number in list1[0] is the same number in list[1] (which 20 is repeated)

the numbers represent coords and the coords of each item in list1 cannot over lap, if they do then i have a module that reruns a make a new list1 untill no coords are the smae

please help

    list1 = [[7, 20], [20, 31, 32], [66, 67, 68],[7, 8, 9, 2],
             [83, 84, 20, 86, 87], [144, 145, 146, 147, 148, 149]]

    x=0
    while x != 169:
        if list1.count(x) > 0:
        print ("repeat found")
    else:
        print ("no repeat found")
    x+=1

By "repeated value" do you mean that a value in one sublist is in another sublist? Or do you mean that a value appears more than once in a single sublist? — Steven Rumbalski, Jun 08 '13 at 22:33
Can you add this remark to the question. This is totally different from what write there. — Mike Müller, Jun 08 '13 at 22:44
Do you need to know where the over lap occurs or just detect it? — dansalmo, Jun 08 '13 at 22:51
Just add some example input with and without repeats to really make clear what "repeat" means. — Mike Müller, Jun 08 '13 at 23:01
You accepted an answer the checks for repeats within each sublists not for repeats between all sublist. This contradicts your question. — Mike Müller, Jun 08 '13 at 23:07
Can there be duplicates in individual sublists? For example, what answer do you want for `list1 = [[1], [2,2]]` (sublists do not overlap, but there are duplicates)? — jfs, Jun 08 '13 at 23:13

score 3 · Accepted Answer · edited May 23 '17 at 12:28

3

How about something like:

is_dup = sum(1 for l in list1 if len(set(l)) < len(l))
if is_dup > 0:
  print ("repeat found")
else:
  print ("no repeat found")

Another example using any:

any(len(set(l)) < len(l) for l in list1)

To check if only one item is repeated in all of the lists I would chain them and check. Credit to this answer for flattening a list of lists.

flattened = sum(list1, [])
if len(flattened) > len(set(flattened)):
  print ("dups")
else:
  print ("no dups")

I guess the proper way to flatten lists is to use itertools.chain which can be used as such:

flattened = list(itertools.chain(*list1))

This can replace the sum call I used above if that seems like a hack.

edited May 23 '17 at 12:28

Community

1
1

answered Jun 08 '13 at 22:28

squiguy

32,370
6
56
63

@JohnMassee This will work if all you care about is checking for duplicates. If you need to know which ones do overlap you might consider using a dictionary type. – squiguy Jun 08 '13 at 22:53
`sum(1 for l in list1 if len(set(l)) < len(l))` gives `0` which is not the result the updated question asks for. The `20` is considered a repeat. – Mike Müller Jun 08 '13 at 23:08
1

@Mike: `len(flattened) > len(set(flattened))` returns the same answer as yours `has_duplicates()` – jfs Jun 08 '13 at 23:18
the code using flattened checks between sublists. it outputted dubs with list1 and no dubs when i changed 20 to a diffent number – i love crysis Jun 08 '13 at 23:19
Ok. Did not read so far, just concentrated on the first version. Sorry for the stir-up. All is fine. :) – Mike Müller Jun 08 '13 at 23:24
@MikeMüller Yea, the question changed once so I added that at the end :). No big deal it's hard to gauge what's going on sometimes. – squiguy Jun 08 '13 at 23:30
@squiguy Fine. You might want to consider rearranging, putting the right answer in front along with a few words of explaination. Otherwise, people who read this later might get confused. – Mike Müller Jun 08 '13 at 23:51
`itertools.chain(*...)` shouldn't be used - it's less efficient than `itertools.chain.from_iterable(...)`, which is designed for that job. – Gareth Latty Jun 08 '13 at 23:58
`flattened = sum(list1, [])` is horribly inefficient. A new list is created for each new item added. – Steven Rumbalski Jun 09 '13 at 14:03

Mike Müller · Answer 2 · 2013-06-08T23:54:34.367

Solution for the updated question

def has_duplicates(iterable):
    """Searching for duplicates in sub iterables.

    This approach can be faster than whole-container solutions
    with flattening if duplicates in large iterables are found 
    early.
    """
    seen = set()
    for sub_list in iterable:
        for item in sub_list:
            if item in seen:
                return True
            seen.add(item)
    return False


>>> has_duplicates(list1)
True
>>> has_duplicates([[1, 2], [4, 5]])
False
>>> has_duplicates([[1, 2], [4, 5, 1]])
True

Lookup in a set is fast. Don't use a list for seen if you want it to be fast.

Solution for the original version of the question

If the length of the list is larger than the length of the set made form this list there must be repeated items because a set can only have unique elements:

>>> L = [[1, 1, 2], [1, 2, 3], [4, 4, 4]]
>>> [len(item) - len(set(item)) for item in L]
[1, 0, 2]

This is the key here

>>> {1, 2, 3, 1, 2, 1}
set([1, 2, 3])

EDIT

If your are not interested in the number of repeats for each sub list. This would be more efficient because its stops after the first number greater than 0:

>>> any(len(item) - len(set(item)) for item in L)
True

Thanks to @mata for pointing this out.

`any(len(item) - len(set(item)) for item in L)` would do if your're just interested wheater there's a match. It has the advantage that any only tries until a match has been found and then returns. — mata, Jun 08 '13 at 22:34
all i need this thing to do is identify weather or not a number is repeated in the whole of list1 — i love crysis, Jun 08 '13 at 22:38

perreal · Answer 3 · 2013-06-08T22:41:08.420

from collections import Counter
list1=[[7, 20], [20, 31, 32], [66, 67, 68],
        [7, 8, 9, 2], [83, 84, 20, 86, 87],
        [144,144, 145, 146, 147, 148, 149]]
for i,l in enumerate(list1):
    for r in [x for x,y in Counter(x for x in l).items() if y > 1]:
        print 'at list ', i, ' item ', r , ' repeats'

and this one gives globally repeated values:

expl=sorted([x for l in list1 for x in l])
print [x for x,y in zip(expl, expl[1:]) if x==y]

score 0 · Answer 4 · answered Jun 08 '13 at 22:31

0

For Python 2.7+, you should try a Counter:

import collections

list = [1, 2, 3, 2, 1]
count = collections.Counter(list)

Then count would be like:

Counter({1: 2, 2: 2, 3:1})

Finding repeated values in multiple lists

4 Answers4

Solution for the updated question

Solution for the original version of the question

EDIT