Group list by values

Question

Let's say I have a list like this:

mylist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]

How can I most elegantly group this to get this list output in Python:

[["A", "C"], ["B"], ["D", "E"]]

So the values are grouped by the secound value but the order is preserved...

`list` is a data type in Python, it is not recommended to use it as a variable name — narendranathjoshi, Jun 07 '16 at 07:53
I edited the question so it doesn't shadow the built-in `list` keyword — Nicolas Gervais, Jun 29 '20 at 13:59

score 116 · Accepted Answer · edited Jun 29 '20 at 13:59

116

values = set(map(lambda x:x[1], mylist))
newlist = [[y[0] for y in mylist if y[1]==x] for x in values]

edited Jun 29 '20 at 13:59

Nicolas Gervais

33,817
13
115
143

answered Apr 17 '11 at 17:51

Howard

38,639
9
64
83

2

`set()` isn't necessarily sorted (though it is for small integer values), if you have a long range use `values = sorted(set(...` – sverre Apr 17 '11 at 18:02
2

@sverre after all it was not required to be sorted – Howard Apr 17 '11 at 18:03
2

Except that `set` does not have an order. It just so happens that for low integers the hash function is identity. I'm also uncertain whether OP intended both orders (order of groups and order in groups) or not; this and sverre's examples sort the groups by key (his also assumes 0..N continuous range). – Yann Vernier Apr 17 '11 at 18:08
2

`lambda x:x[1]` could be replaced with [`operator.itemgetter(1)`](https://docs.python.org/library/operator.html#operator.itemgetter). – Cristian Ciupitu Jun 01 '16 at 13:15
1

Thanks for this. Just FYI to those coming across this, instead of the `map`, it's more Pythonic to use a comprehension: `values = set(x for x in list if x[1])` – mVChr Aug 22 '17 at 17:01
I edited this answer, like the question, so it doesn't shadow the built-in `list` keyword – Nicolas Gervais Jun 29 '20 at 14:00
5

The group could be simply done in `O(n)`, but in this block of code, if elements in `mylist` are all distinct, it would cost `O(n ^ 2)`. – yuanzhen dai May 30 '21 at 10:10
agree with @yuanzhendai – westandskif Nov 12 '21 at 12:52

score 45 · Answer 2 · edited Jan 24 '20 at 22:37

45

from operator import itemgetter
from itertools import groupby

lki = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
lki.sort(key=itemgetter(1))

glo = [[x for x,y in g]
       for k,g in  groupby(lki,key=itemgetter(1))]

print glo

.

EDIT

Another solution that needs no import , is more readable, keeps the orders, and is 22 % shorter than the preceding one:

oldlist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]

newlist, dicpos = [],{}
for val,k in oldlist:
    if k in dicpos:
        newlist[dicpos[k]].extend(val)
    else:
        newlist.append([val])
        dicpos[k] = len(dicpos)

print newlist

edited Jan 24 '20 at 22:37

Community

1
1

answered Apr 17 '11 at 18:02

eyquem

26,771
7
38
46

2

+1 for using `itemgetter`. But note that since you're iterating over the iterators returned by `groupby`, you don't need `list(g)`. – Robert Rossney Apr 17 '11 at 18:09
2

@Robert Rossney Eagle's eye. +1 . By the way, in your code, I find the word 'data' too common to give an idea of what sort of data it is, that's a pity. – eyquem Apr 17 '11 at 18:18

Robert Rossney · Answer 3 · 2011-04-17T18:11:55.533

31

Howard's answer is concise and elegant, but it's also O(n^2) in the worst case. For large lists with large numbers of grouping key values, you'll want to sort the list first and then use itertools.groupby:

>>> from itertools import groupby
>>> from operator import itemgetter
>>> seq = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
>>> seq.sort(key = itemgetter(1))
>>> groups = groupby(seq, itemgetter(1))
>>> [[item[0] for item in data] for (key, data) in groups]
[['A', 'C'], ['B'], ['D', 'E']]

Edit:

I changed this after seeing eyequem's answer: itemgetter(1) is nicer than lambda x: x[1].

edited Apr 17 '11 at 18:11

answered Apr 17 '11 at 18:03

Robert Rossney

94,622
24
146
218

2

But it needs an import. Is it really better than using a lambda ? I wonder. Anyway, for readibility, itemgetter is better, I think – eyquem Apr 17 '11 at 18:20
5

I think so too. Also, it's always good to be reminded of the existence of the `operator` module. – Robert Rossney Apr 17 '11 at 18:42
5

I like lambda better. – lucid_dreamer Jun 12 '18 at 00:51
I think lambda is much better. Its always good not needed to be reminded of a rare module! – polvoazul Apr 01 '20 at 22:50
1

I also like the lambda better, but I think number of imports is not an important consideration, because operator module is part of the standard library. Dependencies are bad, imports are not. – Ali Rasim Kocal Apr 20 '21 at 09:30

score 16 · Answer 4 · answered Apr 18 '11 at 00:28

16

>>> import collections
>>> D1 = collections.defaultdict(list)
>>> for element in L1:
...     D1[element[1]].append(element[0])
... 
>>> L2 = D1.values()
>>> print L2
[['A', 'C'], ['B'], ['D', 'E']]
>>>

answered Apr 18 '11 at 00:28

dting

38,604
10
95
114

score 3 · Answer 5 · answered Apr 17 '11 at 17:50

I don't know about elegant, but it's certainly doable:

oldlist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
# change into: list = [["A", "C"], ["B"], ["D", "E"]]

order=[]
dic=dict()
for value,key in oldlist:
  try:
    dic[key].append(value)
  except KeyError:
    order.append(key)
    dic[key]=[value]
newlist=map(dic.get, order)

print newlist

This preserves the order of the first occurence of each key, as well as the order of items for each key. It requires the key to be hashable, but does not otherwise assign meaning to it.

score 2 · Answer 6 · answered Apr 17 '11 at 17:47

len = max(key for (item, key) in list)
newlist = [[] for i in range(len+1)]
for item,key in list:
  newlist[key].append(item)

You can do it in a single list comprehension, perhaps more elegant but O(n**2):

[[item for (item,key) in list if key==i] for i in range(max(key for (item,key) in list)+1)]

score 1 · Answer 7 · answered May 07 '20 at 11:37

>>> xs = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]
>>> xs.sort(key=lambda x: x[1])
>>> reduce(lambda l, x: (l.append([x]) if l[-1][0][1] != x[1] else l[-1].append(x)) or l, xs[1:], [[xs[0]]]) if xs else []
[[['A', 0], ['C', 0]], [['B', 1]], [['D', 2], ['E', 2]]]

Basically, if the list is sorted, it is possible to reduce by looking at the last group constructed by the previous steps - you can tell if you need to start a new group, or modify an existing group. The ... or l bit is a trick that enables us to use lambda in Python. (append returns None. It is always better to return something more useful than None, but, alas, such is Python.)

score 1 · Answer 8 · answered Nov 12 '21 at 12:49

if using convtools library, which provides a lot of data processing primitives and generates ad hoc code under the hood, then:

from convtools import conversion as c

my_list = [["A", 0], ["B", 1], ["C", 0], ["D", 2], ["E", 2]]

# store the converter somewhere because this is where code generation
# takes place
converter = (
    c.group_by(c.item(1))
    .aggregate(c.ReduceFuncs.Array(c.item(0)))
    .gen_converter()
)
assert converter(my_list) == [["A", "C"], ["B"], ["D", "E"]]

MattSt · Answer 9 · 2022-08-19T11:40:10.533

An answer inspired by @Howard's answer.

from operator import itemgetter

def group_by(nested_iterables: Iterable[Iterable], key_index: int) \
    -> List[Tuple[Any, Iterable[Any]]]:
    """ Groups elements nested in <nested_iterables> based on their <key_index>_th element.

    Behaves similarly to itertools.groupby when the input to the itertools function is sorted.

    E.g. If <nested_iterables> = [(1, 2), (2, 3), (5, 2), (9, 3)] and 
    <key_index> = 1, we will return [(2, [(1, 2), (5, 2)]), (3, [(2, 3), (9,3)])].

    Returns:
        A list of (group_key, values) tuples where <values> is an iterator of the iterables in
            <nested_iterables> that all have their <key_index>_th element equal to <group_key>.
    """
    group_keys = set(map(itemgetter(key_index), nested_iterables))
    return [(key, list(filter(lambda x: x[key_index] == key, nested_iterables)))
             for key in group_keys]

Group list by values

9 Answers9

Linked