1

I have a list of tuples similar to A:

 A = [[(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)], 
[(160, 2, 5), (1000, 2, 5), (111, 1, 2)], 
[(134, 3, 5), (126, 1, 3), (128, 3, 4), (139, 1, 3)], 
[(128, 3, 4)], 
[(90, 1, 5), (160, 2, 5), (134, 3, 5), (1000, 2, 5), (1000, 1, 5), (176, 1, 5)]]

In each row of this list, there might be tuples which their second and third elements are the same. For example in A[0]:

A[0] = [(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)]

(90, 1, 5), (1000, 1, 5) and (176, 1, 5) have the same second and third elements. Among these, I need to keep the one which has the max value for the first element and remove the other two. So, I should be able to keep (1000, 1, 5) and remove (90, 1, 5) and (176, 1, 5) from A[0].

It would be better to keep the ordering of the list.

Is there any way to do that iteratively for all the rows in A? Any help would be appreciated!

user9439906
  • 433
  • 2
  • 7
  • 17
  • 3
    Is preserving the original order in the lists important? – slider Dec 04 '18 at 19:16
  • @slider It would be better to keep the order in the list but if there is no way, then I should ignore preserving the ordering in the list. – user9439906 Dec 04 '18 at 19:19
  • if your original data can be modified, I suggest using a dictionary, with a key of a tuple using the second and third element, and the value being your first element. That way, you can easily update max values, especially if you have a large dataset – Abhishek Patel Dec 04 '18 at 19:19
  • Useful answer here: [removing duplicates while maintaining order](https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-whilst-preserving-order). – blacksite Dec 04 '18 at 19:20
  • 1
    @I'm new to Python, so may I ask to elaborate more? or write the part of the code that you think would work? – user9439906 Dec 04 '18 at 19:21
  • @timgeb I think you are talking about A[2]. That should become: [(134, 3, 5), (128, 3, 4), (139, 1, 3)]. Therefore, (126, 1, 3) should be removed from A[2]. – user9439906 Dec 04 '18 at 19:26

5 Answers5

3

If I understand correctly, here's an itertools.groupby solution. I'm assuming order in the final result does not matter.

from itertools import groupby

def keep_max(lst, groupkey, maxkey):
    'groups lst w.r.t. to groupkey, keeps maximum of each group w.r.t. maxkey'
    sor = sorted(lst, key=groupkey)
    groups = (tuple(g) for _, g in groupby(sor, key=groupkey))
    return [max(g, key=maxkey) for g in groups]

In action:

>>> from operator import itemgetter
>>> groupkey = itemgetter(1, 2)
>>> maxkey = itemgetter(0)
>>> A = [[(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)], [(160, 2, 5), (1000, 2, 5), (111, 1, 2)], [(134, 3, 5), (126, 1, 3), (128, 3, 4), (139, 1, 3)], [(128, 3, 4)], [(90, 1, 5), (160, 2, 5), (134, 3, 5), (1000, 2, 5), (1000, 1, 5), (176, 1, 5)]]
>>>
>>> [keep_max(sub, groupkey, maxkey) for sub in A]
[[(111, 1, 2), (139, 1, 3), (1000, 1, 5)],
 [(111, 1, 2), (1000, 2, 5)],
 [(139, 1, 3), (128, 3, 4), (134, 3, 5)],
 [(128, 3, 4)],
 [(1000, 1, 5), (1000, 2, 5), (134, 3, 5)]]
timgeb
  • 76,762
  • 20
  • 123
  • 145
2

This solution keeps the original ordering of the tuples assuming each tuple (as a whole) is unique; in the case there are duplicates tuples this will return the last appearance of each tuple:

from operator import itemgetter

A = [[(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)],
     [(160, 2, 5), (1000, 2, 5), (111, 1, 2)],
     [(134, 3, 5), (126, 1, 3), (128, 3, 4), (139, 1, 3)],
     [(128, 3, 4)],
     [(90, 1, 5), (160, 2, 5), (134, 3, 5), (1000, 2, 5), (1000, 1, 5), (176, 1, 5)]]


def uniques(lst):
    groups = {}

    for t in lst:
        groups.setdefault(t[1:], []).append(t)

    lookup = {t: i for i, t in enumerate(lst)}
    index = lookup.get

    first = itemgetter(0)
    return sorted(map(lambda x: max(x, key=first), groups.values()), key=index)


result = [uniques(a) for a in A]
print(result)    

Output

[[(139, 1, 3), (1000, 1, 5), (111, 1, 2)], [(1000, 2, 5), (111, 1, 2)], [(134, 3, 5), (128, 3, 4), (139, 1, 3)], [(128, 3, 4)], [(134, 3, 5), (1000, 2, 5), (1000, 1, 5)]]
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1

Using dictionaries:

fin = []
for row in A:
    dict = {}
    for tup in row:
        dict[tup[1:2]] = tup[0]
    fin.append(dict)
A = [[value, t1, t1] for (t1, t2), value in dict.iteritems()]

Using this, your dict will transform A[0] from

A[0] = [(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)]

to

{ (1,5): 1000, (1,3): 139, (1,2): 111 } (as a dict)

and can then be converted back to an array using iteritems

This way, the order will also be preserved.

Abhishek Patel
  • 587
  • 2
  • 8
  • 25
1

If you can afford to ignore ordering, you can use itertools.groupby to group elements by the 2nd and 3rd elements on a list sorted by ascending order of 2nd and 3rd elements and descending order of the first element. Then the first element of each group is your desired choice:

from itertools import groupby

A = [[(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)], 
     [(160, 2, 5), (1000, 2, 5), (111, 1, 2)], 
     [(134, 3, 5), (126, 1, 3), (128, 3, 4), (139, 1, 3)], 
     [(128, 3, 4)], 
     [(90, 1, 5), (160, 2, 5), (134, 3, 5), (1000, 2, 5), (1000, 1, 5), (176, 1, 5)]]

def max_duplicate(lst):
    res = []
    for k, g in groupby(sorted(lst, key=lambda x: (x[1], x[2], -x[0])), key=lambda x: (x[1], x[2])):
        res.append(next(g))
    return res

result = [max_duplicate(l) for l in A]
for r in result:
    print(r)

Output

[(111, 1, 2), (139, 1, 3), (1000, 1, 5)]
[(111, 1, 2), (1000, 2, 5)]
[(139, 1, 3), (128, 3, 4), (134, 3, 5)]
[(128, 3, 4)]
[(1000, 1, 5), (1000, 2, 5), (134, 3, 5)]
slider
  • 12,810
  • 1
  • 26
  • 42
1

You can do this by using a hashmap as follows:

d = {}
for a in A:
    for aa in a:
        v, k1, k2 = aa
        if (k1, k2) in d:
            d[(k1, k2)] = max(v, d[(k1, k2)])
        else:
            d[(k1, k2)] = v

l = [[v, k1, k2] for (k1, k2), v in d.iteritems()]
Autonomous
  • 8,935
  • 1
  • 38
  • 77