Identify duplicate values in a list in Python

Question

Is it possible to get which values are duplicates in a list using python?

I have a list of items:

    mylist = [20, 30, 25, 20]

I know the best way of removing the duplicates is set(mylist), but is it possible to know what values are being duplicated? As you can see, in this list the duplicates are the first and last values. [0, 3].

Is it possible to get this result or something similar in python? I'm trying to avoid making a ridiculously big if elif conditional statement.

possible duplicate of [How to find duplicate elements in array using for loop in python like c/c++?](http://stackoverflow.com/questions/1920145/how-to-find-duplicate-elements-in-array-using-for-loop-in-python-like-c-c) — Yatharth Agarwal, Jul 17 '13 at 14:51
Possible duplicate of [Find and list duplicates in Python list](http://stackoverflow.com/questions/9835762/find-and-list-duplicates-in-python-list) — Anderson Green, Sep 10 '16 at 03:57

John La Rooy · Accepted Answer · 2012-06-27T23:17:43.987

76

These answers are O(n), so a little more code than using mylist.count() but much more efficient as mylist gets longer

If you just want to know the duplicates, use collections.Counter

from collections import Counter
mylist = [20, 30, 25, 20]
[k for k,v in Counter(mylist).items() if v>1]

If you need to know the indices,

from collections import defaultdict
D = defaultdict(list)
for i,item in enumerate(mylist):
    D[item].append(i)
D = {k:v for k,v in D.items() if len(v)>1}

edited Jun 27 '12 at 23:17

answered Jun 27 '12 at 23:11

John La Rooy

295,403
53
369
502

2

You could do this with the more compact `[i for key in (key for key, count in Counter(mylist).items() if count > 1) for i, x in enumerate(mylist) if x == key]` - although it's a bit of a monster, you might want to separate out the generator expression. – Gareth Latty Jun 27 '12 at 23:16
2

You could make `def indices(seq, values):`, `return (i for value in values for i, x in enumerate(seq) if x == value)`, then do `indices(mylist, (key for key, count in Counter(mylist).items() if count > 1)`. That's pretty neat (when not crammed into a comment). – Gareth Latty Jun 27 '12 at 23:23

score 20 · Answer 2 · answered Jun 27 '12 at 23:11

20

Here's a list comprehension that does what you want. As @Codemonkey says, the list starts at index 0, so the indices of the duplicates are 0 and 3.

>>> [i for i, x in enumerate(mylist) if mylist.count(x) > 1]
[0, 3]

answered Jun 27 '12 at 23:11

Junuxx

14,011
5
41
71

15

That's O(n^2)... You can do better. – JBernardo Jun 27 '12 at 23:13
2

@Levon, it does search the whole list – John La Rooy Jun 27 '12 at 23:18
20

For those that don't understand what O(N^2) means: it means that for a 10 element list, you'll be executing 100 steps, for 1000 elements 1 milllion steps, for 1 million elements a million million steps, etc. Quadratic performance will kill your performance very rapidly. – Martijn Pieters Feb 23 '15 at 17:23

score 9 · Answer 3 · answered Nov 07 '15 at 17:52

9

You can use list compression and set to reduce the complexity.

my_list = [3, 5, 2, 1, 4, 4, 1]
opt = [item for item in set(my_list) if my_list.count(item) > 1]

answered Nov 07 '15 at 17:52

ramchauhan

228
2
6

score 7 · Answer 4 · edited Jul 15 '12 at 00:22

7

The following list comprehension will yield the duplicate values:

[x for x in mylist if mylist.count(x) >= 2]

edited Jul 15 '12 at 00:22

octopusgrabbus

10,555
15
68
131

answered Jun 27 '12 at 23:13

Swiss

5,556
1
28
42

This gives the duplicate values, not their indices – Junuxx Jun 27 '12 at 23:14
@Junuxx: Although he does mention the indices, he asks for the values, not the indices. – Swiss Jun 27 '12 at 23:15
1

"As you can see, in this list the duplicates are the first and last values. [0, 3]" seems to indicate the desired output. – Junuxx Jun 27 '12 at 23:15
I'm not quite sure why this has brackets around it either. This is also far less efficient than using a `Counter`. – Gareth Latty Jun 27 '12 at 23:18
1) "Is it possible to get which values are duplicates in a list using python?" 2) "is it possible to know what values are being duplicated?" If what he wants are the indices, he is really bad about asking for them. – Swiss Jun 27 '12 at 23:19
@Lattyware: It is the syntax for a set comprehension. – Swiss Jun 27 '12 at 23:20
1

@Swiss No, it isn't. A set comprehension only requires the curly braces, the brackets here are totally useless. – Gareth Latty Jun 27 '12 at 23:22
@Lattyware: I was confused because `{` is a bracket, but `(` is a parenthesis. You are correct about them being unnecessary then. – Swiss Jun 27 '12 at 23:23
Sorry, that's because I'm in the UK - `()` are usually called *brackets* here, with `[]` and `{}` being *square brackets* and *curly brackets* respectively. – Gareth Latty Jun 27 '12 at 23:28
2

@Swiss I'm not a native speaker, I learned over time `[` -> (square) braket, `(` -> parenthesis, `{` -> (curly) braces in the US .. :) – Levon Jun 27 '12 at 23:29
I tried what was written. It did not work. `>>> {x for x in mylist if mylist.count(x) >= 2} File "", line 1 {x for x in mylist if mylist.count(x) >= 2} SyntaxError: invalid syntax` I changed the braces to square brackets to make a list comprehension. That printed the duplicate values. I looked up set comprehension syntax, and what was written might have worked. ^ – octopusgrabbus Jul 15 '12 at 00:23
@octopusgrabbus Set comprehensions were added in Python 2.7, so it won't work if you are using a version older than that. – Swiss Jul 15 '12 at 17:19
Note that this has a *terrible* performance profile. `list.count()` is a O(N) job (all elements in the list are compared to count) and you are doing this in a loop over N elements, giving you quadratic performance, O(N^2). So for a 10-element list 100 steps are executed, for a 1000 element list 1 million, etc. – Martijn Pieters Feb 23 '15 at 17:20

score 5 · Answer 5 · answered Jul 21 '16 at 13:01

simplest way without any intermediate list using list.index():

z = ['a', 'b', 'a', 'c', 'b', 'a', ]
[z[i] for i in range(len(z)) if i == z.index(z[i])]
>>>['a', 'b', 'c']

and you can also list the duplicates itself (may contain duplicates again as in the example):

[z[i] for i in range(len(z)) if not i == z.index(z[i])]
>>>['a', 'b', 'a']

or their index:

[i for i in range(len(z)) if not i == z.index(z[i])]
>>>[2, 4, 5]

or the duplicates as a list of 2-tuples of their index (referenced to their first occurrence only), what is the answer to the original question!!!:

[(i,z.index(z[i])) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0), (4, 1), (5, 0)]

or this together with the item itself:

[(i,z.index(z[i]),z[i]) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0, 'a'), (4, 1, 'b'), (5, 0, 'a')]

or any other combination of elements and indices....

Rohan Khude · Answer 6 · 2017-01-03T11:32:59.557

I tried below code to find duplicate values from list

1) create a set of duplicate list

2) Iterated through set by looking in duplicate list.

glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
    if(glist.count(c)>1):
        dup.append(c)
print(dup)

OUTPUT

[1, 'one']

Now get the all index for duplicate element

glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
    if(glist.count(c)>1):
        indices = [i for i, x in enumerate(glist) if x == c]
        dup.append((c,indices))
print(dup)

OUTPUT

[(1, [0, 6]), ('one', [3, 7])]

Hope this helps someone

score 2 · Answer 7 · answered Dec 13 '13 at 15:37

2

That's the simplest way I can think for finding duplicates in a list:

my_list = [3, 5, 2, 1, 4, 4, 1]

my_list.sort()
for i in range(0,len(my_list)-1):
               if my_list[i] == my_list[i+1]:
                   print str(my_list[i]) + ' is a duplicate'

answered Dec 13 '13 at 15:37

Andreampa

233
1
3
10

4

If items appear more than twice you'll print those multiple times. – Martijn Pieters Feb 23 '15 at 17:20

score 1 · Answer 8 · answered Mar 29 '17 at 22:05

1

The following code will fetch you desired results with duplicate items and their index values.

  for i in set(mylist):
    if mylist.count(i) > 1:
         print(i, mylist.index(i))

answered Mar 29 '17 at 22:05

Ashish Srivastava

124
1
5

score 0 · Answer 9 · answered Jun 27 '12 at 23:13

0

You should sort the list:

mylist.sort()

After this, iterate through it like this:

doubles = []
for i, elem in enumerate(mylist):
    if i != 0:
        if elem == old:
            doubles.append(elem)
            old = None
            continue
    old = elem

answered Jun 27 '12 at 23:13

Sven Hager

3,144
4
24
32

1

This doesn't get the indices of the items, which the asker appears to want. Also, creating an empty list and looping through items to append some is an anti-pattern in Python, use a list comprehension. – Gareth Latty Jun 27 '12 at 23:17
This too will print items that appear more than twice multiple times. – Martijn Pieters Feb 23 '15 at 17:21

Aashutosh Kumar · Answer 10 · 2019-08-06T22:31:01.880

0

You can print duplicate and Unqiue using below logic using list.

def dup(x):
    duplicate = []
    unique = []
    for i in x:
        if i in unique:
            duplicate.append(i)
        else:
            unique.append(i)
    print("Duplicate values: ",duplicate)
    print("Unique Values: ",unique)

list1 = [1, 2, 1, 3, 2, 5]
dup(list1)

edited Aug 06 '19 at 22:31

answered Aug 06 '19 at 22:25

Aashutosh Kumar

615
9
13

score 0 · Answer 11 · edited Jan 11 '21 at 06:30

0

mylist = [20, 30, 25, 20]

kl = {i: mylist.count(i) for i in mylist if mylist.count(i) > 1 }

print(kl)

edited Jan 11 '21 at 06:30

Joe Ferndz

8,417
2
13
33

answered Sep 21 '20 at 06:27

Piyush

1

score 0 · Answer 12 · answered Feb 25 '21 at 06:53

It looks like you want the indices of the duplicates. Here is some short code that will find those in O(n) time, without using any packages:

dups = {}
[dups.setdefault(v, []).append(i) for i, v in enumerate(mylist)]
dups = {k: v for k, v in dups.items() if len(v) > 1}
# dups now has keys for all the duplicate values
# and a list of matching indices for each

# The second line produces an unused list. 
# It could be replaced with this:
for i, v in enumerate(mylist):
    dups.setdefault(v, []).append(i)

score -2 · Answer 13 · edited Apr 09 '14 at 08:12

-2

m = len(mylist)
for index,value in enumerate(mylist):
        for i in xrange(1,m):
                if(index != i):
                    if (L[i] == L[index]):
                        print "Location %d and location %d has same list-entry:  %r" % (index,i,value)

This has some redundancy that can be improved however.

edited Apr 09 '14 at 08:12

Amicable

3,115
3
49
77

answered Apr 09 '14 at 07:53

Anon

1

score -2 · Answer 14 · edited Feb 25 '21 at 07:17

-2

def checkduplicate(lists): 
 a = []
 for i in lists:
    if  i in a:
        pass   
    else:
        a.append(i)
 return i          
            
print(checkduplicate([1,9,78,989,2,2,3,6,8]))

edited Feb 25 '21 at 07:17

Zephyr

11,891
53
45
80

answered Feb 25 '21 at 06:36

Ramyashree S

1

This prints out the last value in the list. Even if you correct it to `return a`, that removes the duplicates, but the question was "*is it possible to know what values are being duplicated*" – Gino Mempin Feb 25 '21 at 11:03

Identify duplicate values in a list in Python

14 Answers14

Linked

Related