5

Let's say that I have some list:

lst = [[2,6],[1,4],[0,1],[1,1],[2,3],[0,2]]

I want to sort lst by the first element and for each sublist keep the one with the maximal second element when grouped by the first element.

So the results will be:

results
>>> [[0,2],[1,4],[2,6]]

Can someone kindly help me?

mathfux
  • 5,759
  • 1
  • 14
  • 34
Guy
  • 161
  • 1
  • 1
  • 8

5 Answers5

4

You can do it using np.maximum.reduceat:

import numpy as np
lst = np.array([[2,6],[1,4],[0,1],[1,1],[2,3],[0,2]])
lst = lst[np.argsort(lst[:,0])] #sorting lst by first row
u, idx = np.unique(lst[:,0], return_index = True) 
print(np.c_[u, np.maximum.reduceat(lst[:,1], idx)])

At first array should be sorted. Then you need to get indices that splits array into groups: idx = [0, 2, 4] and corresponding values of first column u = [0, 1, 2]. Finally, use np.maximum.reduceat in order to get maximum values of groups that starts at indices idx specified and display it concatenated rightwise to u.

Remark: I used numpy here, a widely used library that allows to push looping into C level which is much faster. Purely pythonic solutions are worth attention too.

Bonus: This is actually a one liner using a numpy_indexed library (not so widely used) dedicated for groupby operations of arrays:

import numpy_indexed as npi
import numpy as np
np.transpose(npi.group_by(lst[:, 0]).max(lst[:, 1]))
mathfux
  • 5,759
  • 1
  • 14
  • 34
4

Assuming you just have 'pairs' like this (e.g. always 2 ints per sublist with the same 1st value and a 2nd value), it's very simple:

>>> lst = [[2,6],[1,4],[0,1],[1,1],[2,3],[0,2]]
>>> sorted(lst)[1::2]
[[0, 2], [1, 4], [2, 6]]

Sorting the list by default sorts on the 1st and then 2nd value of each sublist, then just slice the resulting list to take every other item

Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
  • 1
    When slicing, you are also assuming there are exactly two sublists per different first value, but you did not state that assumption. – Stef Sep 05 '20 at 23:13
  • @Stef Yes exactly, that's also what i meant by 'pairs' but i didn't explain it well – Chris_Rands Sep 06 '20 at 08:49
  • @Chris_Rands It's phrased a bit oddly, but it was clear to me what you meant. And I found it a reasonable assumption (that's why I gave the first upvote), since the example data would be bad if it weren't the case, and since there are applications that do that, for example algorithms that store `(object_id, x_coordinate)` pairs, one pair for "entering" the object (with its left coordinate) and one for "leaving" the object with its right coordinate. All that said, applying a `dict` as in my answer isn't that much more work and makes it more general :-) – superb rain Sep 06 '20 at 14:44
3

Sort the list, group the elements by first item and then keep the max by second item in each group

import itertools as it
from operator import itemgetter

lst = [[2,6],[1,4],[0,1],[1,1],[2,3],[0,2]]

slst = sorted(lst, key=itemgetter(0))
gs = it.groupby(slst, key=itemgetter(0))
res = [max(v, key=itemgetter(1)) for k,v in gs]
print(res)

produces

[[0, 2], [1, 4], [2, 6]]
Pynchia
  • 10,996
  • 5
  • 34
  • 43
1

Try something like the code segment below, which doesn't require any imports.

lst = [[2,6],[1,4],[0,1],[1,1],[2,3],[0,2]]

lst = sorted(lst) # Sort the list in increasing order.
lst = [lst[i] for i in range(len(lst)) if i+1 == len(lst) or lst[i+1][0] != lst[i][0]]
# Remove the elements with minimum 2nd element.

print(lst)

Output:

[[0, 2], [1, 4], [2, 6]]
solid.py
  • 2,782
  • 5
  • 23
  • 30
1

Another way, using a dict.

>>> [*dict(sorted(lst)).items()]
[(0, 2), (1, 4), (2, 6)]

It produces the pairs as tuples instead of lists, but you even accepted an answer that produces a numpy array. To get lists:

>>> [*map(list, dict(sorted(lst)).items())]
[[0, 2], [1, 4], [2, 6]]

These solutions work because the dict keeps the last value for each key, so if we sort first, then the last is the largest.

superb rain
  • 5,300
  • 2
  • 11
  • 25
  • Nice approach! assuming CPython 3.6+ where dicts retain insertion order – Chris_Rands Sep 07 '20 at 08:52
  • @Chris_Rands Hmm. I switched from assuming CPython 3.6+ to assuming Python 3.7+ a while ago. It's been over two years :-). But I just checked, of the [five alternative implementations](https://www.python.org/download/alternatives/), only one appears to be at 3.7+. So I guess assuming Python 3.7+ doesn't *broaden* the user set but *narrows* it. Anyway, `sorted(dict(sorted(lst)).items())` for non-ordered dict. – superb rain Sep 07 '20 at 09:24