python list comprehension: gathering duplicate columns

Question

I have a sorted list of lists that contain duplicate first elements. Currently I'm iterating over it to get the solution.

[['5th ave', 111, -30.00, 38.00],
['5th ave', 222, -30.00, 33.00],
['6th ave', 2224, -32.00, 34.90]]

I'd like an elegant list comprehension to convert this to a list of lists based on the first element:

['5th ave', [[111, -30.00, 38.00] , [222, -30.00, 33.00]]

Thanks

[A very close SO question](http://stackoverflow.com/questions/15751979/grouping-python-dictionary-keys-as-a-list-and-create-a-new-dictionary-with-this). — fjarri, Sep 02 '13 at 05:50
Why, exactly, do you want a solution in the form of a list comprehension? — tom10, Sep 02 '13 at 05:53
thanks for all the solutions. to answer @tom10, i was looking to see if a one liner was possible. no real reason other than my head was about to explode trying to figure it out. — sucasa, Sep 02 '13 at 15:38

score 8 · Answer 1 · answered Sep 02 '13 at 05:52

8

Looks like a job for collections.defaultdict:

>>> from collections import defaultdict
>>> L = [['5th ave', 111, -30.00, 38.00],
... ['5th ave', 222, -30.00, 33.00],
... ['6th ave', 2224, -32.00, 34.90]]
>>> d = defaultdict(list)
>>> for sublist in L:
...     d[sublist[0]].append(sublist[1:])
... 
>>> print d.items()
[('5th ave', [[111, -30.0, 38.0], [222, -30.0, 33.0]]), ('6th ave', [[2224, -32.0, 34.9]])]

There's absolutely no reason to have a list comprehension. Just because it's less lines does not mean it's more pythonic.

answered Sep 02 '13 at 05:52

TerryA

58,805
11
114
143

You beat me to this , but the war is not over yet! :P Great answer though. – Games Brainiac Sep 02 '13 at 05:54
just a joke. thing this solution breaks the sort. i was hoping to keep it in a list and still sorted. – sucasa Sep 02 '13 at 06:04
Why `defaultdict` and why not `setdefault` in ordinary `dict`? – thefourtheye Sep 02 '13 at 06:04
@thefourtheye I've never used a setdefault before. Looking at http://stackoverflow.com/questions/3483520/use-cases-for-the-setdefault-dict-method , it seems defaultdict replaces it for the majority of cases – TerryA Sep 02 '13 at 06:09
@user1550052 http://stackoverflow.com/questions/6190331/can-i-do-an-ordered-default-dict-in-python May help you :) – TerryA Sep 02 '13 at 06:18

thefourtheye · Answer 2 · 2013-09-02T06:26:36.497

1

data = [['5th ave', 111, -30.00, 38.00],
['5th ave', 222, -30.00, 33.00],
['6th ave', 2224, -32.00, 34.90]]

previous   = ""
listOfData = []
result     = []
for currentItem in data:
    if currentItem[0] != previous:
        if listOfData:
            result.append([previous, listOfData])
            listOfData = []
        previous = currentItem[0]
    listOfData.append(currentItem[1:])

if listOfData:
    result.append([previous, listOfData])

print result

Output

[['5th ave', [[111, -30.0, 38.0], [222, -30.0, 33.0]]], ['6th ave', [[2224, -32.0, 34.9]]]]

This maintains the order as well.

Edit:

With defaultdict I could reduce few lines

from collections import defaultdict

data = [['5th ave', 111, -30.00, 38.00],
['5th ave', 222, -30.00, 33.00],
['6th ave', 2224, -32.00, 34.90]]

unique, Map = [], defaultdict(list)
for item in data:
    if item[0] not in unique: unique.append(item[0])
    Map[item[0]].append(item[1:])
print [(item, Map[item]) for item in unique]

This still maintains order.

edited Sep 02 '13 at 06:26

answered Sep 02 '13 at 06:12

thefourtheye

233,700
52
457
497

1

You should really only use upper case letters at the beginning of variable names for classes – TerryA Sep 02 '13 at 06:12
@Haidro I updated the solution. I prefer naming the variables this way. Is there any specific reason why we should use uppercase letters only at the beginning? – thefourtheye Sep 02 '13 at 06:15
thanks this is similar to what i was doing. just wanted to see if it was possible with a list comprehension. – sucasa Sep 02 '13 at 06:16
have a look at [the PEP 8 style guide](http://www.python.org/dev/peps/pep-0008/#naming-conventions) ;) – TerryA Sep 02 '13 at 06:16

score 1 · Accepted Answer · answered Sep 02 '13 at 06:16

collections.defaultdict really is the way to go, but I feel it might be slower which is why I came up with this:

from itertools import imap

def RemDup(L):
    ListComp = {}
    for sublist in L:
        try: ListComp[sublist[0]].append(sublist[1:])
        except KeyError: ListComp[sublist[0]] = [sublist[1:]]
    return imap( list, ListComp.items() )

DupList = [['5th ave', 111, -30.00, 38.00],
['5th ave', 222, -30.00, 33.00],
['6th ave', 2224, -32.00, 34.90]]

print [ uniq for uniq in RemDup(DupList) ]

python list comprehension: gathering duplicate columns

3 Answers3