2

I've seen a lot of variations of this question from things as simple as remove duplicates to finding and listing duplicates. Even trying to take bits and pieces of these examples does not get me my result.

My question is how am I able to check if my list has a duplicate entry? Even better, does my list have a non-zero duplicate?

I've had a few ideas -

#empty list
myList = [None] * 9 

#all the elements in this list are None

#fill part of the list with some values
myList[0] = 1
myList[3] = 2
myList[4] = 2
myList[5] = 4
myList[7] = 3

#coming from C, I attempt to use a nested for loop
j = 0
k = 0
for j in range(len(myList)):
    for k in range(len(myList)):
        if myList[j] == myList[k]:
            print "found a duplicate!"
            return

If this worked, it would find the duplicate (None) in the list. Is there a way to ignore the None or 0 case? I do not care if two elements are 0.

Another solution I thought of was turn the list into a set and compare the lengths of the set and list to determine if there is a duplicate but when running set(myList) it not only removes duplicates, it orders it as well. I could have separate copies, but it seems redundant.

jtor
  • 133
  • 1
  • 4
  • 13
  • 1
    You're on the right track! I would definitely recommend the `set` operation, as it's a single function call that gets exactly what you need; you can then pop out the `None`s and `0`s from your final set. – ericmjl Jan 28 '15 at 21:27
  • Regular sets in Python lack a defined order, but there is always [OrderedDict](http://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set) – Paulo Scardine Jan 28 '15 at 21:27
  • 3
    `return` outside of function = syntax error – Paul Rooney Jan 28 '15 at 21:30
  • You'd also want to check that you aren't comparing an index against itself and also that you don't derive 2 duplicates for both times you compare the indexes (e.g. dont count 2 duplicates because 3 is a duplicate of 4 but 4 is also a duplicate of 3) – Paul Rooney Jan 28 '15 at 21:32
  • Jtor, see my code below. Once the function finds an element that occurs more than once, it returns as a duplicate. – Malik Brahimi Jan 28 '15 at 21:36
  • 1
    "Is there a way to ignore the None or 0 case?" Sure: `if myList[i] is None or myList[i] == 0: continue` – Jasper Jan 28 '15 at 21:37
  • Thanks guys. I actually also figured out instead of exampleList = [None] * 9 I can do [0]*9 which is even better for my situation! – jtor Jan 29 '15 at 14:18

7 Answers7

3

If you simply want to check if it contains duplicates. Once the function finds an element that occurs more than once, it returns as a duplicate.

my_list = [1, 2, 2, 3, 4]

def check_list(arg):
    for i in arg:
        if arg.count(i) > 1:
            return 'Duplicate'

print check_list(my_list) == 'Duplicate' # prints True
Malik Brahimi
  • 16,341
  • 7
  • 39
  • 70
2

Try changing the actual comparison line to this:

if myList[j] == myList[k] and not myList[j] in [None, 0]:
paolo
  • 2,528
  • 3
  • 17
  • 25
2

I'm not certain if you are trying to ascertain whether or a duplicate exists, or identify the items that are duplicated (if any). Here is a Counter-based solution for the latter:

# Python 2.7
from collections import Counter

#
# Rest of your code
#

counter = Counter(myList)
dupes = [key for (key, value) in counter.iteritems() if value > 1 and key]
print dupes

The Counter object will automatically count occurances for each item in your iterable list. The list comprehension that builds dupes essentially filters out all items appearing only once, and also upon items whose boolean evaluation are False (this would filter out both 0 and None).

If your purpose is only to identify that duplication has taken place (without enumerating which items were duplicated), you could use the same method and test dupes:

if dupes:  print "Something in the list is duplicated"
rchang
  • 5,150
  • 1
  • 15
  • 25
1

To remove dups and keep order ignoring 0 and None, if you have other falsey values that you want to keep you will need to specify is not None and not 0:

print [ele for ind, ele in enumerate(lst[:-1]) if ele not in lst[:ind] or not ele] 

If you just want the first dup:

for ind, ele in enumerate(lst[:-1]):
    if ele in lst[ind+1:] and ele:
        print(ele)
        break

Or store seen in a set:

seen = set()
for  ele in lst:
    if ele in seen:
        print(ele)
        break
    if ele:
        seen.add(ele) 
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • Thank you, but what if I wanted the second duplicate? – Mike Issa Aug 03 '17 at 23:19
  • 2
    @MikeIssa, so the second occurrence? It would only really make sense of you wanted the index or to reorder based on when the second appeared, if you had a concrete example, it would be easy to implement, with a `collections.Counter` – Padraic Cunningham Aug 04 '17 at 18:52
0

You can use collections.defaultdict and specify a condition, such as non-zero / Truthy, and specify a threshold. If the count for a particular value exceeds the threshold, the function will return that value. If no such value exists, the function returns False.

from collections import defaultdict

def check_duplicates(it, condition, thresh):
    dd = defaultdict(int)
    for value in it:
        dd[value] += 1
        if condition(value) and dd[value] > thresh:
            return value
    return False

L = [1, None, None, 2, 2, 4, None, 3, None]

res = check_duplicates(L, condition=bool, thresh=1)  # 2

Note in the above example the function bool will not consider 0 or None for threshold breaches. You could also use, for example, lambda x: x != 1 to exclude values equal to 1.

jpp
  • 159,742
  • 34
  • 281
  • 339
-2

Here's a bit of code that will show you how to remove None and 0 from the sets.

l1 = [0, 1, 1, 2, 4, 7, None, None]

l2 = set(l1)
l2.remove(None)
l2.remove(0)
ericmjl
  • 13,541
  • 12
  • 51
  • 80
-2

In my opinion, this is the simplest solution I could come up with. this should work with any list. The only downside is that it does not count the number of duplicates, but instead just returns True or False

for k, j in mylist:
    return k == j
CodedCuber
  • 1
  • 1
  • 3