1

I have this list of lists in Python:

[[100,XHS,0],
[100,34B,3],
[100,42F,1],
[101,XHS,2],
[101,34B,5],
[101,42F,2],
[102,XHS,1],
[102,34B,2],
[102,42F,0],
[103,XHS,0],
[103,34B,4],
[103,42F,2]]

and I would like to find the most efficient way (I'm dealing with a lot of data) to create a new list of lists using the last element from each list for each id (the first element).. So for the sample list above, my result would be:

[[0,3,1],
[2,5,2],
[1,2,0],
[0,4,2]]

How can I implement this in Python? Thanks

user2578185
  • 417
  • 1
  • 6
  • 10
  • 2
    How do you know the size of each sublist should be 3? – kojiro Aug 02 '13 at 13:25
  • 3
    FYI that list isn't valid Python - the item in the middle should be quoted. – thegrinner Aug 02 '13 at 13:26
  • each sublist contains only 3 elements..the ID, a code, and the occurence of that code for each Id...I want to take the count of each code for each id and create n count vectors where n is the number of unique IDs (e.g.100,101 etc) – user2578185 Aug 02 '13 at 13:28
  • 1
    The second item of each sublist has to be a string as it contains alphanumeric, otherwise python throws an error. – DevLounge Aug 02 '13 at 13:31
  • @thegrinner It could be valid Python. How do you know XHS isn't a name? – kojiro Aug 02 '13 at 13:39
  • @kojiro I don't, but either way it's invalid Python as it's posted - either the variable isn't defined or the string isn't quoted. My point is the example isn't the start of an [SSCCE](http://sscce.org/). – thegrinner Aug 02 '13 at 13:39
  • @thegrinner Technically true, but StackOverflow wouldn't be very useful if we required askers to define unimportant terms of data. Letting `XHS` just be a symbol doesn't affect the outcome of this question one way or the other. – kojiro Aug 02 '13 at 13:41

5 Answers5

8

An itertools approach with the building blocks broken out - get last elements, group into threes, convert groups of 3 into a list...

from operator import itemgetter
from itertools import imap, izip

last_element = imap(itemgetter(-1), a)
in_threes = izip(*[iter(last_element)] * 3)
res = map(list, in_threes)
# [[0, 3, 1], [2, 5, 2], [1, 2, 0], [0, 4, 2]]

However, it looks like you want to "group" on the first element (instead of purely blocks of 3 consecutive items), so you can use defaultdict for this:

from collections import defaultdict
dd = defaultdict(list)
for el in a:
    dd[el[0]].append(el[-1])

# defaultdict(<type 'list'>, {100: [0, 3, 1], 101: [2, 5, 2], 102: [1, 2, 0], 103: [0, 4, 2]})
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
2
new_list = []
temp_list = []
counter = 1

for x in list:
  temp_list.extend(x[-1])
  if ((counter % 3) == 0):
    new_list.append(temp_list)
    temp_list = []
  counter += 1
print new_list
kojiro
  • 74,557
  • 19
  • 143
  • 201
Tall Paul
  • 2,398
  • 3
  • 28
  • 34
2

You are trying to do two things here:

  • Get the last element of each nested list.
  • Group those elements by the first element of each nested list.

You can use list comprehension to get the last element of each nested list:

last_elems = [sublist[-1] for sublist in outerlist]

If the whole list is sorted by the first element (the id) then you can use itertools.groupby to do the second part:

from itertools import groupby
from operator import itemgetter

[[g[-1] for g in group] for id_, group in groupby(outerlist, key=itemgetter(0))]

Demo:

>>> outerlist = [
...     [100,'XHS',0],
...     [100,'34B',3],
...     [100,'42F',1],
...     [101,'XHS',2],
...     [101,'34B',5],
...     [101,'42F',2],
...     [102,'XHS',1],
...     [102,'34B',2],
...     [102,'42F',0],
...     [103,'XHS',0],
...     [103,'34B',4],
...     [103,'42F',2]
... ]
>>> from itertools import groupby
>>> from operator import itemgetter
>>> [[g[-1] for g in group] for id_, group in groupby(outerlist, key=itemgetter(0))]
[[0, 3, 1], [2, 5, 2], [1, 2, 0], [0, 4, 2]]

If it wasn't sorted, you'd either have to sort it first (using outerlist.sort(key=itemgetter)), or, if you don't need a sorted version anywhere else, use a collections.defaultdict approach to grouping:

from collections import defaultdict

grouped = defaultdict(list)
for sublist in outerlist:
    grouped[sublist[0]].append(sublist[-1])

output = grouped.values()
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I like this answer, but is this the most efficient way to do this? It is definitely a very concise way to do it. – But I'm Not A Wrapper Class Aug 02 '13 at 13:32
  • @MohammadS.: it is more efficient than using `zip(*outerlist)[0]` in that it doesn't build new tuples for the discarded columns. – Martijn Pieters Aug 02 '13 at 13:32
  • It's amazing how many people *want* to re-answer the [how do you split a list](http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python) question. – kojiro Aug 02 '13 at 13:35
1

If you don't know how many items are for each key and items for each key go consecutively in the original list, you can use groupby:

>>> from itertools import groupby,izip
>>> from operator import itemgetter
>>> [map(itemgetter(-1),it) for key,it in groupby(L,itemgetter(0))]
[[0, 3, 1], [2, 5, 2], [1, 2, 0], [0, 4, 2]]

Explanation

Each it is an iterator over items with the same key:

>>> [list(it) for key,it in groupby(L,itemgetter(0))]
[[[100, 'XHS', 0], [100, '34B', 3], [100, '42F', 1]], [[101, 'XHS', 2], [101, '34B', 5], [101, '42F', 2]], [[102, 'XHS', 1], [102, '34B', 2], [102, '42F', 0]], [[103, 'XHS', 0], [103, '34B', 4], [103, '42F', 2]]]

map just takes the last element from each sublist:

>>> [map(itemgetter(-1),it) for key,it in groupby(L,itemgetter(0))]
[[0, 3, 1], [2, 5, 2], [1, 2, 0], [0, 4, 2]]
ovgolovin
  • 13,063
  • 6
  • 47
  • 78
0
l=[[100,'XHS',0],
[100,'34B',3],
[100,'42F',1],
[100,'XHS',0],
[100,'34B',30],
[100,'42F',10],
[100,'XHS',0],
[100,'34B',300],
[100,'42F',100]]

def chunks(l, n):
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

will print:

[[0, 3, 1], [0, 30, 10], [0, 300, 100]]
piokuc
  • 25,594
  • 11
  • 72
  • 102