Reporting duplicate from tuple list python (by its index)

Question

Getting the repeated count data from tuple list item 1 which holds the patient counter data...data[1]. With below down samples i dont need to consider the duplicates on data[0] or data[2]

import itertools
def getDuplicateinTuple(dataInput):
    seen={}
    return [seen.setdefault(t[0], t) for t in dataInput if t[0] not in seen]

data=[('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'),
('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER2'),
('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER3'),
('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER4'),
('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1'),
('2013 Jul  5 06:56:11:', 'PATIENT:COUNTER5')]

data1=[('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 '), 
('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER2', 'COUNTER INFO: : 500 '), 
('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER3', 'COUNTER INFO: : 100 '), 
('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER4', 'COUNTER INFO: : 100 ')]

s=getDuplicateinTuple(data)
print s
s1=getDuplicateinTuple(data1)
print s1

and the expected output is :

 [('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1')]

and actual output is

[('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1'), ('2013 Jul  5 06:56:11:', 'PATIENT:COUNTER5')]

on same if I give a non duplicate output as in data1

expected output :

[]

but current output:

[('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 ')]

Just by comparing the list this can be achieved. What is the better and suggested way to make achieve this?

I saw some nice stack post on this regards: Find and list duplicates in a list?

@falsetru : with ref http://ideone.com/DWv7uq data1 print getDuplicateinTuple(data) out is requied as [('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul 5 06:57:11:', 'PATIENT:COUNTER1')] and print getDuplicateinTuple(data1) matches as expected — Ragav, Feb 10 '14 at 15:32

falsetru · Accepted Answer · 2014-02-10T15:32:32.043

2

Using collections.defaultdict:

from collections import defaultdict

def getDuplicateinTuple(dataInput):
    d = defaultdict(list)
    for t in dataInput:
        item1 = t[1]
        d[item1].append(t)
    return [t for ts in d.itervalues() if len(ts) > 1 for t in ts]

data = [
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'),
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER2'),
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER3'),
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER4'),
    ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1'),
    ('2013 Jul  5 06:56:11:', 'PATIENT:COUNTER5')
]

data1 = [
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 '), 
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER2', 'COUNTER INFO: : 500 '), 
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER3', 'COUNTER INFO: : 100 '), 
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER4', 'COUNTER INFO: : 100 ')
]

print getDuplicateinTuple(data)
# => [('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'),
#     ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1')]
print getDuplicateinTuple(data1)
# => []

edited Feb 10 '14 at 15:32

answered Feb 10 '14 at 14:00

falsetru

357,413
63
732
636

if i have to omit timestamp means considering on data[1] not data[0] – Ragav Feb 10 '14 at 14:33
data field contains 2 item data[0] is some time stamp and data[1] is content. if i just want to extract only the content(data[1]) duplication what is the approach. – Ragav Feb 10 '14 at 14:37
In tuple list data if data[1] has been duplicated i want to extract and show the data. – Ragav Feb 10 '14 at 15:21
@Ragav, I updated the answer according to question modification. – falsetru Feb 10 '14 at 15:34

score 0 · Answer 2 · answered Feb 10 '14 at 14:20

0

You can create a (default) dictionary to count the occurrences and then filter out the occurrences which are less than one:

from collections import defaultdict
d = defaultdict(list)
for timestamp, counter in data:
    d[counter].append(timestamp)

for counter, timestamps in d.items():
    if len(timestamps) > 1:
        print([(t, counter) for t in timestamps])

answered Feb 10 '14 at 14:20

arocks

2,862
1
12
20

if i have to omit time stamp means considering on data[1] not data[0] – Ragav Feb 10 '14 at 14:34
Omit the timestamp in the output? Change last line to `print(counter)`. – arocks Feb 10 '14 at 14:36

Reporting duplicate from tuple list python (by its index)

2 Answers2