How to get duplicate strings of list with indices in Python

Question

I do realize this has already been addressed here (e.g., Removing duplicates in the lists), Accessing the index in 'for' loops?, Append indices to duplicate strings in Python efficiently and many more...... Nevertheless, I hope this question was different.

Pretty much I need to write a program that checks if a list has any duplicates and if it does, returns the duplicate element along with the indices.

The sample list sample_list

sample = """An article is any member of a class of dedicated words that are used with noun phrases to
mark the identifiability of the referents of the noun phrases. The category of articles constitutes a
part of speech. In English, both "the" and "a" are articles, which combine with a noun to form a noun
phrase."""

sample_list = sample.split()

my_list = [x.lower() for x in sample_list]

len(my_list)

output: 55

The common approach to get a unique collection of items is to use a set, set will help here to remove duplicates.

unique_list = list(set(my_list))
len(unique_list)

output: 38

This is what I have tried but honestly, I don't know what to do next...

from functools import partial

def list_duplicates_of(seq,item):
    start_at = -1
    locs = []
    while True:
        try:
            loc = seq.index(item,start_at+1)
        except ValueError:
            break
        else:
            locs.append(loc)
            start_at = loc
    return locs

dups_in_source = partial(list_duplicates_of, my_list)

for i in my_list:
    print(i, dups_in_source(i))

This returns all the elements with indices and duplicate indices

an [0]
article [1]
.
.
.
form [51]
a [6, 33, 48, 52]
noun [15, 26, 49, 53]
phrase. [54]

Here I want to return only duplicate elements along with their indices like below

of [5, 8, 21, 24, 30, 35]
a [6, 33, 48, 52]
are [12, 43]
with [14, 47]
.
.
.
noun [15, 26, 49, 53]

You might want to first remove from `sample` anything that isn't an alpha or space character. — Booboo, Feb 10 '21 at 12:15

user2390182 · Accepted Answer · 2021-02-10T12:09:05.353

1

You could do something along these lines:

from collections import defaultdict

indeces = defaultdict(list)

for i, w in enumerate(my_list):
    indeces[w].append(i)

for k, v in indeces.items():
    if len(v) > 1:
        print(k, v)

of [5, 8, 21, 24, 30, 35]
a [6, 33, 48, 52]
are [12, 43]
with [14, 47]
noun [15, 26, 49, 53]
to [17, 50]
the [19, 22, 25, 28]

This uses collections.defaultdict and enumerate to efficiently collect the indeces of each word. Ridding this of duplicates remains a simple conditional comprehension or loop with an if statement.

edited Feb 10 '21 at 12:09

answered Feb 10 '21 at 12:07

user2390182

72,016
6
67
89

1

Nah, `sample_list` and `my_list` only differ in case sensitivity. – user2390182 Feb 10 '21 at 12:09
1

@SayandipDutta My bad updated the desired output now. – Ailurophile Feb 10 '21 at 12:16

How to get duplicate strings of list with indices in Python

1 Answers1