0

Getting the repeated count data from tuple list item 1 which holds the patient counter data...data[1]. With below down samples i dont need to consider the duplicates on data[0] or data[2]

import itertools
def getDuplicateinTuple(dataInput):
    seen={}
    return [seen.setdefault(t[0], t) for t in dataInput if t[0] not in seen]

data=[('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'),
('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER2'),
('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER3'),
('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER4'),
('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1'),
('2013 Jul  5 06:56:11:', 'PATIENT:COUNTER5')]

data1=[('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 '), 
('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER2', 'COUNTER INFO: : 500 '), 
('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER3', 'COUNTER INFO: : 100 '), 
('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER4', 'COUNTER INFO: : 100 ')]

s=getDuplicateinTuple(data)
print s
s1=getDuplicateinTuple(data1)
print s1

and the expected output is :

 [('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1')]

and actual output is

[('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1'), ('2013 Jul  5 06:56:11:', 'PATIENT:COUNTER5')]

on same if I give a non duplicate output as in data1

expected output :

 []

but current output:

[('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 ')]

Just by comparing the list this can be achieved. What is the better and suggested way to make achieve this?

I saw some nice stack post on this regards: Find and list duplicates in a list?

Community
  • 1
  • 1
Ragav
  • 942
  • 4
  • 19
  • 37
  • @falsetru : with ref http://ideone.com/DWv7uq data1 print getDuplicateinTuple(data) out is requied as [('2013 Jul 5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul 5 06:57:11:', 'PATIENT:COUNTER1')] and print getDuplicateinTuple(data1) matches as expected – Ragav Feb 10 '14 at 15:32

2 Answers2

2

Using collections.defaultdict:

from collections import defaultdict

def getDuplicateinTuple(dataInput):
    d = defaultdict(list)
    for t in dataInput:
        item1 = t[1]
        d[item1].append(t)
    return [t for ts in d.itervalues() if len(ts) > 1 for t in ts]

data = [
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'),
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER2'),
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER3'),
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER4'),
    ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1'),
    ('2013 Jul  5 06:56:11:', 'PATIENT:COUNTER5')
]

data1 = [
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 '), 
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER2', 'COUNTER INFO: : 500 '), 
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER3', 'COUNTER INFO: : 100 '), 
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER4', 'COUNTER INFO: : 100 ')
]

print getDuplicateinTuple(data)
# => [('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'),
#     ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1')]
print getDuplicateinTuple(data1)
# => []
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • if i have to omit timestamp means considering on data[1] not data[0] – Ragav Feb 10 '14 at 14:33
  • data field contains 2 item data[0] is some time stamp and data[1] is content. if i just want to extract only the content(data[1]) duplication what is the approach. – Ragav Feb 10 '14 at 14:37
  • In tuple list data if data[1] has been duplicated i want to extract and show the data. – Ragav Feb 10 '14 at 15:21
  • @Ragav, I updated the answer according to question modification. – falsetru Feb 10 '14 at 15:34
0

You can create a (default) dictionary to count the occurrences and then filter out the occurrences which are less than one:

from collections import defaultdict
d = defaultdict(list)
for timestamp, counter in data:
    d[counter].append(timestamp)

for counter, timestamps in d.items():
    if len(timestamps) > 1:
        print([(t, counter) for t in timestamps])
arocks
  • 2,862
  • 1
  • 12
  • 20