0

How do I find duplicate values in the following list of tuples?

[(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]

I want to get a list like:

[4081, 4082, 4086, 4090]

I have tried using itemgetter then group by option but didn't work.

How can one do this?

jeffknupp
  • 5,966
  • 3
  • 28
  • 29
Rohan Nayani
  • 43
  • 3
  • 12
  • can you post your attempt ? – 0xtvarun Jul 27 '16 at 17:54
  • Go with below link may be it can help you. [link1](http://stackoverflow.com/questions/32464290/python-find-tuples-from-a-list-of-tuples-having-duplicate-data-in-the-0th-elem) [link2](http://stackoverflow.com/questions/17482944/find-duplicate-items-within-a-list-of-list-of-tuples-python) – Shubham Baranwal Jul 27 '16 at 17:54

4 Answers4

3

Use an ordered dictionary with first items as its keys and list of second items as values (for duplicates which created using dict.setdefalt()) then pick up those that have a length more than 1:

>>> from itertools import chain
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> for i, j in lst:
...     d.setdefault(i,[]).append(j)
... 
>>> 
>>> list(chain.from_iterable([j for i, j in d.items() if len(j)>1]))
[4081, 4082, 4086, 4090]
Mazdak
  • 105,000
  • 18
  • 159
  • 188
1

As an alternative, if you want to use groupby, here is a way to do it:

In [1]: from itertools import groupby

In [2]: ts = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]

In [3]: dups = []

In [4]: for _, g in groupby(ts, lambda x: x[0]):
   ...:     grouped = list(g)
   ...:     if len(grouped) > 1:
   ...:         dups.extend([dup[1] for dup in grouped])
   ...:         

In [5]: print(dups)
[4081, 4082, 4086, 4090]

You use groupby to group from the first element of the tuple, and add the duplicate value into the list from the tuple.

Anzel
  • 19,825
  • 5
  • 51
  • 52
1

Yet another approach (without any imports):

In [896]: lot = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]

In [897]: d = dict()

In [898]: for key, value in lot:
     ...:     d[key] = d.get(key, []) + [value]
     ...: 
     ...: 

In [899]: d
Out[899]: {1622: [4081, 4082], 1624: [4083], 1626: [4085], 1650: [4086, 4090]}

In [900]: [d[key] for key in d if len(d[key]) > 1]
Out[900]: [[4086, 4090], [4081, 4082]]

In [901]: sorted([num for num in lst for lst in [d[key] for key in d if len(d[key]) > 1]])
Out[901]: [4081, 4081, 4082, 4082]
Tonechas
  • 13,398
  • 16
  • 46
  • 80
0

Haven't tested this.... (edit: yup, it works)

l = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]

dup = []

for i, t1 in enumerate(l):
    for t2 in l[i+1:]:
        if t1[0]==t2[0]:
            dup.extend([t1[1], t2[1]])
print dup
Aaron
  • 10,133
  • 1
  • 24
  • 40