Returning duplicate tuples that match at a number of different indices

Question

I have tuples nested in a list i.e. [(0,1,2,3,4,5,6,7), (etc)].

I'm trying to return the tuples that are matching at index 0 & 1, and 3 to 5 (and to keep the ordering of the data inside the tuple)

So with the code below I'm trying to remove the duplicates, then with the result I'm comparing this to the original list to identify the duplicates removed:

seen = set()
seen_add = seen.add
newL = []
for a in myList:
     if a[:2] not in seen and not seen_add(a[:2]):
        if a[3:6] not in seen and not seen_add(a[3:6]):
            newL.append(a)



result = list(set(myList) - set(newL))
for i in result: print i

But the first part removes a tuple that doesn't even have a duplicate.

N.B. The code for removing the first two elements came from here (by Martijn Pieters): Removing Duplicates from Nested List Based on First 2 Elements; but removing additional elements resulted in the aforementioned 'error'.

To return the duplicates (this code follows the answer by @unutbu)

for i in result:
        for e in newL:
            if i[:2]==e[:2]:
                if i[3:6]==e[3:6]:
                    print e, i

I feel stupid but ... does that code really match with your explanation: _"I'm trying to return the tuples that are matching at index 0 & 1, and 3 to 5 (and to keep the ordering of the data inside the tuple)"_ ? — Sylvain Leroux, Aug 21 '14 at 14:02
Well I'm the one who must have asked the stupid question then :P Your right though I've just been trying to get my head around it, I'll add the last part (to my post) which returns the duplicate tuples as a result of @unutbu code below. — Soap, Aug 21 '14 at 14:42

unutbu · Accepted Answer · 2014-08-21T14:05:41.370

1

If you want to regard elements as matching when a[:2]+a[3:6] is the same, then you need to add a[:2]+a[3:6] to seen, rather than a[:2] and a[3:6] separately:

seen = set()
seen_add = seen.add
newL = [a  for a in myList
        if a[:2]+a[3:6] not in seen
        and not seen_add(a[:2]+a[3:6])]

newL will then contain "unique" elements from myList, with the order preserved. Note that calling set(myList) will destroy the order of the items in myList, so

result = list(set(myList) - set(newL))

will contain the duplicate elements, but the order will not be preserved.

edited Aug 21 '14 at 14:05

answered Aug 21 '14 at 14:00

unutbu

842,883
184
1,785
1,677

That's perfect, thank you :) I'll the answer the last part of the question for returing the duplicate tuples to my original post. But thank you I've been scratching my head over this for longer than I'd like to say :P – Soap Aug 21 '14 at 14:39

Returning duplicate tuples that match at a number of different indices

1 Answers1