0

I do realize this has already been addressed here (e.g., Removing duplicates in the lists), Accessing the index in 'for' loops?, Append indices to duplicate strings in Python efficiently and many more...... Nevertheless, I hope this question was different.

Pretty much I need to write a program that checks if a list has any duplicates and if it does, returns the duplicate element along with the indices.

The sample list sample_list

sample = """An article is any member of a class of dedicated words that are used with noun phrases to
mark the identifiability of the referents of the noun phrases. The category of articles constitutes a
part of speech. In English, both "the" and "a" are articles, which combine with a noun to form a noun
phrase."""

sample_list = sample.split()
my_list = [x.lower() for x in sample_list]

len(my_list)
output: 55

The common approach to get a unique collection of items is to use a set, set will help here to remove duplicates.

unique_list = list(set(my_list))
len(unique_list)
output: 38

This is what I have tried but honestly, I don't know what to do next...

from functools import partial

def list_duplicates_of(seq,item):
    start_at = -1
    locs = []
    while True:
        try:
            loc = seq.index(item,start_at+1)
        except ValueError:
            break
        else:
            locs.append(loc)
            start_at = loc
    return locs

dups_in_source = partial(list_duplicates_of, my_list)

for i in my_list:
    print(i, dups_in_source(i))

This returns all the elements with indices and duplicate indices

an [0]
article [1]
.
.
.
form [51]
a [6, 33, 48, 52]
noun [15, 26, 49, 53]
phrase. [54]

Here I want to return only duplicate elements along with their indices like below

of [5, 8, 21, 24, 30, 35]
a [6, 33, 48, 52]
are [12, 43]
with [14, 47]
.
.
.
noun [15, 26, 49, 53]
Ailurophile
  • 2,552
  • 7
  • 21
  • 46
  • You might want to first remove from `sample` anything that isn't an alpha or space character. – Booboo Feb 10 '21 at 12:15

1 Answers1

1

You could do something along these lines:

from collections import defaultdict

indeces = defaultdict(list)

for i, w in enumerate(my_list):
    indeces[w].append(i)

for k, v in indeces.items():
    if len(v) > 1:
        print(k, v)

of [5, 8, 21, 24, 30, 35]
a [6, 33, 48, 52]
are [12, 43]
with [14, 47]
noun [15, 26, 49, 53]
to [17, 50]
the [19, 22, 25, 28]

This uses collections.defaultdict and enumerate to efficiently collect the indeces of each word. Ridding this of duplicates remains a simple conditional comprehension or loop with an if statement.

user2390182
  • 72,016
  • 6
  • 67
  • 89