Index of duplicates items in a python list

Question

Does anyone know how I can get the index position of duplicate items in a python list? I have tried doing this and it keeps giving me only the index of the 1st occurrence of the of the item in the list.

List = ['A', 'B', 'A', 'C', 'E']

I want it to give me:

index 0: A   
index 2: A

Note that the [Python Style Guide](http://www.python.org/dev/peps/pep-0008/) says you should not use capitalized names for variables, and also avoid using names of builtin classes, like list. — Lauritz V. Thaulow, Mar 24 '11 at 12:55
@martineau: I know, but I wanted to make sure he did not fix the issue with capitalization simply by lower-casing his variable. — Lauritz V. Thaulow, Mar 24 '11 at 14:12
This is asking both "how to get the index of repeated occurrences of an item in a sequence", which is answered [here](https://stackoverflow.com/questions/22267241/how-to-find-the-index-of-the-nth-time-an-item-appears-in-a-list), and "how to find duplicate items in a sequence", which is answered [here](https://stackoverflow.com/questions/11236006/identify-duplicate-values-in-a-list-in-python). — mkrieger1, Jan 23 '22 at 20:09

PaulMcG · Answer 1 · 2018-10-18T10:57:11.843

You want to pass in the optional second parameter to index, the location where you want index to start looking. After you find each match, reset this parameter to the location just after the match that was found.

def list_duplicates_of(seq,item):
    start_at = -1
    locs = []
    while True:
        try:
            loc = seq.index(item,start_at+1)
        except ValueError:
            break
        else:
            locs.append(loc)
            start_at = loc
    return locs

source = "ABABDBAAEDSBQEWBAFLSAFB"
print(list_duplicates_of(source, 'B'))

Prints:

[1, 3, 5, 11, 15, 22]

You can find all the duplicates at once in a single pass through source, by using a defaultdict to keep a list of all seen locations for any item, and returning those items that were seen more than once.

from collections import defaultdict

def list_duplicates(seq):
    tally = defaultdict(list)
    for i,item in enumerate(seq):
        tally[item].append(i)
    return ((key,locs) for key,locs in tally.items() 
                            if len(locs)>1)

for dup in sorted(list_duplicates(source)):
    print(dup)

Prints:

('A', [0, 2, 6, 7, 16, 20])
('B', [1, 3, 5, 11, 15, 22])
('D', [4, 9])
('E', [8, 13])
('F', [17, 21])
('S', [10, 19])

If you want to do repeated testing for various keys against the same source, you can use functools.partial to create a new function variable, using a "partially complete" argument list, that is, specifying the seq, but omitting the item to search for:

from functools import partial
dups_in_source = partial(list_duplicates_of, source)

for c in "ABDEFS":
    print(c, dups_in_source(c))

Prints:

A [0, 2, 6, 7, 16, 20]
B [1, 3, 5, 11, 15, 22]
D [4, 9]
E [8, 13]
F [17, 21]
S [10, 19]

Just wanted to tell you that your solution was the fastest of all suggested here — Ruslan Bes, Apr 25 '14 at 13:20

score 41 · Answer 2 · edited Jan 23 '22 at 20:12

41

>>> def indices(lst, item):
...   return [i for i, x in enumerate(lst) if x == item]
... 
>>> indices(List, "A")
[0, 2]

To get all duplicates, you can use the below method, but it is not very efficient. If efficiency is important you should consider Ignacio's solution instead.

>>> dict((x, indices(List, x)) for x in set(List) if List.count(x) > 1)
{'A': [0, 2]}

As for solving it using the index method of list instead, that method takes a second optional argument indicating where to start, so you could just repeatedly call it with the previous index plus 1.

>>> List.index("A")
0
>>> List.index("A", 1)
2

edited Jan 23 '22 at 20:12

mkrieger1

19,194
5
54
65

answered Mar 24 '11 at 12:40

Lauritz V. Thaulow

49,139
12
73
92

can someone please explain how this is human readable? `i for i`??? `[i for i, x in enumerate(lst) if x == item]` – uberrebu Sep 09 '22 at 06:57

score 17 · Answer 3 · edited May 23 '17 at 12:02

I made a benchmark of all solutions suggested here and also added another solution to this problem (described in the end of the answer).

Benchmarks

First, the benchmarks. I initialize a list of n random ints within a range [1, n/2] and then call timeit over all algorithms

The solutions of @Paul McGuire and @Ignacio Vazquez-Abrams works about twice as fast as the rest on the list of 100 ints:

Testing algorithm on the list of 100 items using 10000 loops
Algorithm: dupl_eat
Timing: 1.46247477189
####################
Algorithm: dupl_utdemir
Timing: 2.93324529055
####################
Algorithm: dupl_lthaulow
Timing: 3.89198786645
####################
Algorithm: dupl_pmcguire
Timing: 0.583058259784
####################
Algorithm: dupl_ivazques_abrams
Timing: 0.645062989076
####################
Algorithm: dupl_rbespal
Timing: 1.06523873786
####################

If you change the number of items to 1000, the difference becomes much bigger (BTW, I'll be happy if someone could explain why) :

Testing algorithm on the list of 1000 items using 1000 loops
Algorithm: dupl_eat
Timing: 5.46171654555
####################
Algorithm: dupl_utdemir
Timing: 25.5582547323
####################
Algorithm: dupl_lthaulow
Timing: 39.284285326
####################
Algorithm: dupl_pmcguire
Timing: 0.56558489513
####################
Algorithm: dupl_ivazques_abrams
Timing: 0.615980005148
####################
Algorithm: dupl_rbespal
Timing: 1.21610942322
####################

On the bigger lists, the solution of @Paul McGuire continues to be the most efficient and my algorithm begins having problems.

Testing algorithm on the list of 1000000 items using 1 loops
Algorithm: dupl_pmcguire
Timing: 1.5019953958
####################
Algorithm: dupl_ivazques_abrams
Timing: 1.70856155898
####################
Algorithm: dupl_rbespal
Timing: 3.95820421595
####################

The full code of the benchmark is here

Another algorithm

Here is my solution to the same problem:

def dupl_rbespal(c):
    alreadyAdded = False
    dupl_c = dict()
    sorted_ind_c = sorted(range(len(c)), key=lambda x: c[x]) # sort incoming list but save the indexes of sorted items

    for i in xrange(len(c) - 1): # loop over indexes of sorted items
        if c[sorted_ind_c[i]] == c[sorted_ind_c[i+1]]: # if two consecutive indexes point to the same value, add it to the duplicates
            if not alreadyAdded:
                dupl_c[c[sorted_ind_c[i]]] = [sorted_ind_c[i], sorted_ind_c[i+1]]
                alreadyAdded = True
            else:
                dupl_c[c[sorted_ind_c[i]]].append( sorted_ind_c[i+1] )
        else:
            alreadyAdded = False
    return dupl_c

Although it's not the best it allowed me to generate a little bit different structure needed for my problem (i needed something like a linked list of indexes of the same value)

Note, the benchmark used Paul McGuire's `list_duplicates(seq)` function, not the `list_duplicates_of(seq,item)` function. — nmz787, Sep 18 '14 at 22:00

score 15 · Answer 4 · answered Mar 24 '11 at 12:52

15

dups = collections.defaultdict(list)
for i, e in enumerate(L):
  dups[e].append(i)
for k, v in sorted(dups.iteritems()):
  if len(v) >= 2:
    print '%s: %r' % (k, v)

And extrapolate from there.

answered Mar 24 '11 at 12:52

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

score 11 · Answer 5 · answered Dec 14 '13 at 17:06

11

I think I found a simple solution after a lot of irritation :

if elem in string_list:
    counter = 0
    elem_pos = []
    for i in string_list:
        if i == elem:
            elem_pos.append(counter)
        counter = counter + 1
    print(elem_pos)

This prints a list giving you the indexes of a specific element ("elem")

answered Dec 14 '13 at 17:06

Shonu93

846
1
10
19

Indeed, this is the reliable way to work with lists that duplicates. Congrats and Thanks :). – ivanleoncz Sep 30 '17 at 21:17
I'd add comments to make it more readable to beginners. Very good solution! – Toma Nov 12 '21 at 19:23
thanks and this solution is very simplest form – SreehariGaddam Dec 30 '21 at 06:26

utdemir · Answer 6 · 2011-03-24T13:16:17.720

Using new "Counter" class in collections module, based on lazyr's answer:

>>> import collections
>>> def duplicates(n): #n="123123123"
...     counter=collections.Counter(n) #{'1': 3, '3': 3, '2': 3}
...     dups=[i for i in counter if counter[i]!=1] #['1','3','2']
...     result={}
...     for item in dups:
...             result[item]=[i for i,j in enumerate(n) if j==item] 
...     return result
... 
>>> duplicates("123123123")
{'1': [0, 3, 6], '3': [2, 5, 8], '2': [1, 4, 7]}

eat · Answer 7 · 2011-03-24T15:56:46.600

from collections import Counter, defaultdict

def duplicates(lst):
    cnt= Counter(lst)
    return [key for key in cnt.keys() if cnt[key]> 1]

def duplicates_indices(lst):
    dup, ind= duplicates(lst), defaultdict(list)
    for i, v in enumerate(lst):
        if v in dup: ind[v].append(i)
    return ind

lst= ['a', 'b', 'a', 'c', 'b', 'a', 'e']
print duplicates(lst) # ['a', 'b']
print duplicates_indices(lst) # ..., {'a': [0, 2, 5], 'b': [1, 4]})

A slightly more orthogonal (and thus more useful) implementation would be:

from collections import Counter, defaultdict

def duplicates(lst):
    cnt= Counter(lst)
    return [key for key in cnt.keys() if cnt[key]> 1]

def indices(lst, items= None):
    items, ind= set(lst) if items is None else items, defaultdict(list)
    for i, v in enumerate(lst):
        if v in items: ind[v].append(i)
    return ind

lst= ['a', 'b', 'a', 'c', 'b', 'a', 'e']
print indices(lst, duplicates(lst)) # ..., {'a': [0, 2, 5], 'b': [1, 4]})

mobiuscreek · Answer 8 · 2021-02-12T15:49:29.287

4

In a single line with pandas 1.2.2 and numpy:

 import numpy as np
 import pandas as pd
 
 idx = np.where(pd.DataFrame(List).duplicated(keep=False))

The argument keep=False will mark every duplicate as True and np.where() will return an array with the indices where the element in the array was True.

edited Feb 12 '21 at 15:49

answered Feb 12 '21 at 15:42

mobiuscreek

421
6
6

wordsforthewise · Answer 9 · 2018-06-01T21:45:32.727

Wow, everyone's answer is so long. I simply used a pandas dataframe, masking, and the duplicated function (keep=False markes all duplicates as True, not just first or last):

import pandas as pd
import numpy as np
np.random.seed(42)  # make results reproducible

int_df = pd.DataFrame({'int_list': np.random.randint(1, 20, size=10)})
dupes = int_df['int_list'].duplicated(keep=False)
print(int_df['int_list'][dupes].index)

This should return Int64Index([0, 2, 3, 4, 6, 7, 9], dtype='int64').

score 3 · Answer 10 · answered Jul 24 '19 at 16:40

3

def index(arr, num):
    for i, x in enumerate(arr):
        if x == num:
            print(x, i)

#index(List, 'A')

answered Jul 24 '19 at 16:40

fuwiak

721
1
8
25

score 1 · Answer 11 · edited Oct 24 '16 at 06:37

1

string_list = ['A', 'B', 'C', 'B', 'D', 'B']
pos_list = []
for i in range(len(string_list)):
    if string_list[i] = ='B':
        pos_list.append(i)
print pos_list

edited Oct 24 '16 at 06:37

Julien

13,986
5
29
53

answered Oct 24 '16 at 05:27

Agnisha Singh

19
1

1

Add some explanation with answer for how this answer help OP in fixing current issue – ρяσѕρєя K Oct 24 '16 at 05:34

score 1 · Answer 12 · answered Mar 05 '22 at 02:49

This is a good question and there is a lot of ways to it.

The code below is one of the ways to do it

letters = ["a", "b", "c", "d", "e", "a", "a", "b"] 

lettersIndexes = [i for i in range(len(letters))] # i created a list that contains the indexes of my previous list
counter = 0 
for item in letters: 
    if item == "a": 
        print(item, lettersIndexes[counter]) 
    counter += 1 # for each item it increases the counter which means the index

An other way to get the indexes but this time stored in a list

letters = ["a", "b", "c", "d", "e", "a", "a", "b"] 
lettersIndexes = [i for i in range(len(letters)) if letters[i] == "a" ] 
print(lettersIndexes) # as you can see we get a list of the indexes that we want.

Good day

score 1 · Answer 13 · answered Mar 17 '23 at 18:21

There are a lot of responses already, but I really like this solution, and it is really fast (it uses a pandas.Series since they are faster to create than pd.DataFrames).

The benefit of this one is that it ignores the first element of all of the repeats.

import numpy as np
import pandas as pd

lst = [0, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 7, 8, 9, 9]
#index 0  1  2  3  4  5  6  7  8  9  10 11 12 13 14
#duplicates  |     |  |              |           |

indices = np.where(pd.Series(lst).duplicated())[0]

print(indices)
# [ 2  4  5 10 14]

score 0 · Answer 14 · edited Nov 24 '19 at 04:29

def find_duplicate(list_):
    duplicate_list=[""]

    for k in range(len(list_)):
        if duplicate_list.__contains__(list_[k]):
            continue
        for j in range(len(list_)):
            if k == j:
                continue
            if list_[k] == list_[j]:
                duplicate_list.append(list_[j])
                print("duplicate "+str(list_.index(list_[j]))+str(list_.index(list_[k])))

Yi Xiang Chong · Answer 15 · 2020-07-10T14:46:21.577

Here is one that works for multiple duplicates and you don't need to specify any values:

List = ['A', 'B', 'A', 'C', 'E', 'B'] # duplicate two 'A's two 'B's

ix_list = []
for i in range(len(List)):
    try:
        dup_ix = List[(i+1):].index(List[i]) + (i + 1) # dup onwards + (i + 1)
        ix_list.extend([i, dup_ix]) # if found no error, add i also
    except:
        pass
    
ix_list.sort()

print(ix_list)
[0, 1, 2, 5]

score 0 · Answer 16 · edited Mar 11 '21 at 09:02

def dup_list(my_list, value):
    '''
    dup_list(list,value)
        This function finds the indices of values in a list including duplicated values.

        list: the list you are working on

        value: the item of the list you want to find the index of

            NB: if a value is duplcated, its indices are stored in a list
            If only one occurence of the value, the index is stored as an integer.

            Therefore use isinstance method to know how to handle the returned value
    '''
    value_list = []
    index_list = []
    index_of_duped = []

    if my_list.count(value) == 1:
        return my_list.index(value)  
        
    elif my_list.count(value) < 1:
        return 'Your argument is not in the list'

    else:
        for item in my_list:
            value_list.append(item)
            length = len(value_list)
            index = length - 1
            index_list.append(index)

            if item == value:
                index_of_duped.append(max(index_list))

        return index_of_duped

# function call eg dup_list(my_list, 'john')

Prabhas · Answer 17 · 2021-11-16T10:09:17.067

0

def duplicates(list,dup):
  a=[list.index(dup)]
  for i in list:
     try: 
        a.append(list.index(dup,a[-1]+1))
     except:
        for i in a:
           print(f'index {i}: '+dup)
        break
duplicates(['A', 'B', 'A', 'C', 'E'],'A')

  Output:
          index 0: A
          index 2: A

edited Nov 16 '21 at 10:09

answered Sep 23 '21 at 06:23

Prabhas

1
3

score 0 · Answer 18 · answered Sep 27 '21 at 19:23

If you want to get index of all duplicate elements of different types you can try this solution:

# note: below list has more than one kind of duplicates
List = ['A', 'B', 'A', 'C', 'E', 'E', 'A', 'B', 'A', 'A', 'C']
d1 = {item:List.count(item) for item in List}  # item and their counts
elems = list(filter(lambda x: d1[x] > 1, d1))  # get duplicate elements
d2 = dict(zip(range(0, len(List)), List))  # each item and their indices

# item and their list of duplicate indices
res = {item: list(filter(lambda x: d2[x] == item, d2)) for item in elems}

Now, if you print(res) you'll get to see this:

{'A': [0, 2, 6, 8, 9], 'B': [1, 7], 'C': [3, 10], 'E': [4, 5]}

score 0 · Answer 19 · answered Jul 19 '22 at 12:49

Using a dictionary approach based on setdefault instance method.

List = ['A', 'B', 'A', 'C', 'B', 'E', 'B']

# keep track of all indices of every term
duplicates = {}
for i, key in enumerate(List):
    duplicates.setdefault(key, []).append(i)

# print only those terms with more than one index
template = 'index {}: {}'
for k, v in duplicates.items():
    if len(v) > 1:
        print(template.format(k, str(v).strip('][')))

Remark: Counter, defaultdict and other container class from collections are subclasses of dict hence share the setdefault method as well

score -1 · Answer 20 · answered Mar 02 '14 at 18:26

I'll mention the more obvious way of dealing with duplicates in lists. In terms of complexity, dictionaries are the way to go because each lookup is O(1). You can be more clever if you're only interested in duplicates...

my_list = [1,1,2,3,4,5,5]
my_dict = {}
for (ind,elem) in enumerate(my_list):
    if elem in my_dict:
        my_dict[elem].append(ind)
    else:
        my_dict.update({elem:[ind]})

for key,value in my_dict.iteritems():
    if len(value) > 1:
        print "key(%s) has indices (%s)" %(key,value)

which prints the following:

key(1) has indices ([0, 1])
key(5) has indices ([5, 6])

score -1 · Answer 21 · answered Oct 07 '18 at 17:14

-1

a= [2,3,4,5,6,2,3,2,4,2]
search=2
pos=0
positions=[]

while (search in a):
    pos+=a.index(search)
    positions.append(pos)
    a=a[a.index(search)+1:]
    pos+=1

print "search found at:",positions

answered Oct 07 '18 at 17:14

Umesh Singatiya

1
1

score -3 · Answer 22 · edited Oct 27 '20 at 20:52

-3

I just make it simple:

i = [1,2,1,3]
k = 0
for ii in i:    
if ii == 1 :
    print ("index of 1 = ", k)
k = k+1

output:

 index of 1 =  0

 index of 1 =  2

edited Oct 27 '20 at 20:52

Umutambyi Gad

4,082
3
18
39

answered Feb 12 '15 at 08:04

Sathish Chinnasamy

91
2
7

Index of duplicates items in a python list

22 Answers22

Benchmarks

Another algorithm

Linked

Related