Comparing first element of the consecutive lists of tuples in Python

Question

I have a list of tuples, each containing two elements. The first element of few sublists is common. I want to compare the first element of these sublists and append the second element in one lists. Here is my list:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

I would like to make a list of lists out of it which looks something like this:`

NewList=[(2,3,4,5),(6,7,8),(9,10)]

I hope if there is any efficient way.

What if the first element of a sub-*tuple* is not common? You want a tuple of single element? — Anand S Kumar, Sep 19 '15 at 10:53
That's not a list of lists, but a lists of tuples; that doesn't make a difference to your question, but still, you should be aware of the difference — Marcus Müller, Sep 19 '15 at 10:53

Padraic Cunningham · Answer 1 · 2015-09-19T11:48:57.410

You can use an OrderedDict to group the elements by the first subelement of each tuple:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

from collections import OrderedDict

od  = OrderedDict()

for a,b in myList:
    od.setdefault(a,[]).append(b)

print(list(od.values()))
[[2, 3, 4, 5], [6, 7, 8], [9, 10]]

If you really want tuples:

print(list(map(tuple,od.values())))
[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

If you did not care about the order the elements appeared and just wanted the most efficient way to group you could use a collections.defaultdict:

from collections import defaultdict

od  = defaultdict(list)

for a,b in myList:
    od[a].append(b)

print(list(od.values()))

Lastly, if your data is in order as per your input example i.e sorted you could simply use itertools.groupby to group by the first subelement from each tuple and extract the second element from the grouped tuples:

from itertools import groupby
from operator import itemgetter
print([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])

Output:

[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

Again the groupby will only work if your data is sorted by at least the first element.

Some timings on a reasonable sized list:

In [33]: myList = [(randint(1,10000),randint(1,10000)) for _ in range(100000)]

In [34]: myList.sort()

In [35]: timeit ([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])
10 loops, best of 3: 44.5 ms per loop

In [36]: %%timeit                                                               od = defaultdict(list)
for a,b in myList:
    od[a].append(b)
   ....: 
10 loops, best of 3: 33.8 ms per loop

In [37]: %%timeit
dictionary = OrderedDict()
for x, y in myList:
     if x not in dictionary:
        dictionary[x] = [] # new empty list
    dictionary[x].append(y)
   ....: 
10 loops, best of 3: 63.3 ms per loop

In [38]: %%timeit   
od = OrderedDict()
for a,b in myList:
    od.setdefault(a,[]).append(b)
   ....: 
10 loops, best of 3: 80.3 ms per loop

If order matters and the data is sorted, go with the groupby, it will get even closer to the defaultdict approach if it is necessary to map all the elements to tuple in the defaultdict.

If the data is not sorted or you don't care about any order, you won't find a faster way to group than using the defaultdict approach.

Thanks Padraic, for the code. This helped me to fix my naive problem. — PythonNoob, Sep 19 '15 at 11:00
@PadraicCunningham I retract my opinion that this is probably the most efficient way; see [my second answer](http://stackoverflow.com/a/32667469/4433386). — Marcus Müller, Sep 19 '15 at 11:09
@MarcusMüller, none of your approaches keep the order, if order is irrelevant my defaultdict approach will be more efficient than any of your two answers. Not sure on the groupby as I did not time it — Padraic Cunningham, Sep 19 '15 at 11:11
@PadraicCunningham good point, but using `OrderedDict()` instead of `{}` would fix that, so I reffered to your answer in my updates. Assuming, of course, that "ordered" doesn't mean "in order of first occurence of first tuple member", but "sorted numerically". — Marcus Müller, Sep 19 '15 at 11:12

Marcus Müller · Answer 2 · 2015-09-19T11:12:52.020

4

This feels like a task for a dictionary (if you don't know dictionaries yet, look them up on python.org). This is a very verbose example, so it's not what I'd write in everyday coding, but it's better to be verbose than unclear:

dictionary = collections.OrderedDict()
for x, y in myList:
    if not dictionary.has_key(x):
        dictionary[x] = [] # new empty list
    # append y to that list
    dictionary[x].append(y)

edited Sep 19 '15 at 11:12

answered Sep 19 '15 at 11:00

Marcus Müller

34,677
4
53
94

Thanks Marcus for the suggestion. I would like to know about different modules of Python, which will make me fluent with Python. Let me know if you have any such better suggestions. – PythonNoob Sep 19 '15 at 11:20

score 2 · Answer 3 · edited May 23 '17 at 10:27

2

Having thought about this, the most efficient approach is probably this one-liner (assuming dictionary is an empty dict, i.e. dictionary = {} or dictionary = OrderedDict() like in Padraic' excellent answer):

for x,y in myList: dictionary.setdefault(x,[]).append(y)

I'm not saying this is the easiest to read approach, but I like it :)

EDIT Ha! Benchmarking proved me wrong; the setdefault approach is slower than the if not dictionary.has_key(x): dictionary[x]=[] approach:

>>> timeit.timeit("for x,y in myList:\n    if not dictionary.has_key(x):\n        dictionary[x]=[]\n    dictionary[x].append(y)", "from collections import OrderedDict\nmyList=[(1,2),(1,3),(
1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]\ndictionary=OrderedDict()")
2.2573769092559814
>>> timeit.timeit("for x,y in myList: dictionary.setdefault(x,[]).append(y)", "from collections import OrderedDict\nmyList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]\ndictiona
ry=OrderedDict()")
3.3534231185913086

Of course, Padraic was still right: his defaultdict approach uses but 0.82s on my machine, so it's faster by a factor of 3.

Also, as Padraic pointed out: dict.has_key(x) has been deprecated, and one should use x in dict instead; however, I couldn't measure a speed difference.

edited May 23 '17 at 10:27

Community

1
1

answered Sep 19 '15 at 11:08

Marcus Müller

34,677
4
53
94

`Having thought about this` did you also *measure*? BTW: why do you "copy" an answer? – Wolf Sep 19 '15 at 11:13
@Wolf: I'm referring to, not copying :) – Marcus Müller Sep 19 '15 at 11:13
Yes, but for saying this that is *probably* the *most efficient approach* a comment will be enough. – Wolf Sep 19 '15 at 11:16
It's a self-sufficient answer on it's own. I don't agree with your criticism on that. Also, benchmarks! – Marcus Müller Sep 19 '15 at 11:19
`has_key` has been removed in python3, `if key in dict` would make more sense – Padraic Cunningham Sep 19 '15 at 11:21
@PadraicCunningham: true, but Q says [tag:python-2.7]. – Marcus Müller Sep 19 '15 at 11:24
After benchmarking? What's the value/point of your answer now? – Wolf Sep 19 '15 at 11:24
@PadraicCunningham: congrats, 3 times faster than my faster approach! – Marcus Müller Sep 19 '15 at 11:25
1

It is deprecated in python2, `in` is also faster than has_key – Padraic Cunningham Sep 19 '15 at 11:26
@Wolf: after benchmarking, my answer is still valuable because it shows a) another short, precise, pythonic approach, b) it compares that approach to other approaches, and c) it comes to the conclusion which approach is faster. Stop bickering; downvote please, and accept that I find my answer useful, because otherwise I wouldn't be giving it :) – Marcus Müller Sep 19 '15 at 11:27
@PadraicCunningham: even with laaaarge sample sizes, `timeit` can't spot a consistent difference between `in` and `has_key`. Updated answer :) – Marcus Müller Sep 19 '15 at 11:30
Sorry, I only tried to understand, so I asked. [Your first answer](http://stackoverflow.com/a/32667397/2932052) was (and still is) my favorit to this question. – Wolf Sep 19 '15 at 11:35

score 1 · Answer 4 · answered Sep 19 '15 at 11:13

1

The following should work:

import itertools

myList = [(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]
print [tuple(x[1] for x in g) for k, g in itertools.groupby(myList, key=lambda x: x[0])]

Which displays:

[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

answered Sep 19 '15 at 11:13

Martin Evans

45,791
17
81
97

Sorry, total coincidence. – Martin Evans Sep 19 '15 at 11:15
might want to mention the data must be in sorted order, at least as far as the first elements go – Padraic Cunningham Sep 19 '15 at 11:29

Comparing first element of the consecutive lists of tuples in Python

4 Answers4

Linked