2

I have a list that contains duplicate elements. For all duplicate elements, I would like to obtain a list of their indices. The final output should be a list of lists of duplicate indices.

I have already come up with a working solution, but I have the feeling, that there might be a more computationally efficient and/or sparse way (using less code) for this problem:

# set up a list that contains duplicate elements
a = ['bar','foo','bar','foo','foobar','barfoo']

# get list of elements that appear more than one time in the list
seen = {}
dupes = []

for x in a:
    if x not in seen:
        seen[x] = 1
    else:
        if seen[x] == 1:
            dupes.append(x)
        seen[x] += 1

# for each of those elements, return list of indices of matching elements
# in original list
dupes_indices = []

for dupe in dupes:
    indices = [i for i, x in enumerate(a) if x == dupe]
    dupes_indices.append(indices)

where dupes_indices is [[0, 2], [1, 3]] ('foo' appears at indices 0 and 2 and 'bar' appears at indices 1 and 3)

I used the code from this and from this answer on stackoverflow.

Johannes Wiesner
  • 1,006
  • 12
  • 33

2 Answers2

1

You could try this nested list comprehension one-liner:

a = ['bar','foo','bar','foo','foobar','barfoo']
print([y for y in [[i for i, v in enumerate(a) if v == x] for x in set(a)] if len(y) > 1])

Output:

[[0, 2], [1, 3]]
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
0

pandas selection is great for such solution:

df = pd.DataFrame(['bar','foo','bar','foo','foobar','barfoo'])
df.columns = ["elements"]
elements = set(df.elements.tolist())

for e in elements:
    x = df.loc[df.elements == e]
    print(e, x.index.tolist())

output

bar [0, 2]
foobar [4]
foo [1, 3]
barfoo [5]
billz
  • 44,644
  • 9
  • 83
  • 100