2

Consider the following simplified case:

lol = [['John','Polak',5,3,7,9],
       ['John','Polak',7,9,2,3],
       ['Mark','Eden' ,0,3,3,1],
       ['Mark','Eden' ,5,1,2,9]]

What would be a pythonic and memory+speed efficient way to transform this list-of-lists to a list-of-lists-of-lists based on the first two parameters:

lolol = [[['John','Polak',5,3,7,9],
          ['John','Polak',7,9,2,3]],
         [['Mark','Eden' ,0,3,3,1],
          ['Mark','Eden' ,5,1,2,9]]]

Actually - any other data structure would also be ok, as long as I have the correct hierarchy. For example the following dictionary structure comes to mind, but creating it doesn't seem efficient speed-efficient enough, and the memory would probably be higher than the lolol solution.

dolol = {('John','Polak'):[[5,3,7,9],[7,9,2,3]],
         ('Mark','Eden') :[[0,3,3,1],[5,1,2,9]]}
Jonathan Livni
  • 101,334
  • 104
  • 266
  • 359
  • is the input list always going to be sorted by key, or might it be in mixed order? – Jesse Cohen Feb 12 '11 at 20:28
  • How would a dict be inefficient? I.e. what do you want to do with the data (that would make dicts inefficient)? –  Feb 12 '11 at 20:28
  • @Jesse - Yes, the keys are a priori sorted and that is definitely my need. Although for curiosity I wouldn't mind knowing the other case as well. – Jonathan Livni Feb 12 '11 at 20:31
  • @delnan - as dict is hashed, creating it would take more time, also the hash would consume more memory. I'm not sure how much, but I am dealing with hundreds of mega of data so it could get messy... – Jonathan Livni Feb 12 '11 at 20:32
  • 1
    @Jonathan: A dict does take more memory, but not significantly more time if you're ever going to search for something in it. –  Feb 12 '11 at 20:37
  • 1
    You might think about using a class, considering your data will probably be more meaningful in that context. To keep it memory-efficient, use the `__slots__` builtin. – Seth Johnson Feb 12 '11 at 22:09
  • @delnan - after creating the new data structure, I only need to traverse it once linearly – Jonathan Livni Feb 13 '11 at 07:25
  • @Johnathan: In that case, go with @tokland's answer. –  Feb 13 '11 at 09:09

3 Answers3

6

List:

from itertools import groupby
lolol = [list(grp) for (match, grp) in groupby(lol, lambda lst: lst[:2])]
# [[['John', 'Polak', 5, 3, 7, 9], ['John', 'Polak', 7, 9, 2, 3]],
#  [['Mark', 'Eden', 0, 3, 3, 1], ['Mark', 'Eden', 5, 1, 2, 9]]]

Dictionary:

dolol = dict((tuple(match), [x[2:] for x in grp]) for (match, grp) in 
             groupby(lol, lambda lst: lst[:2]))
# {('John', 'Polak'): [[5, 3, 7, 9], [7, 9, 2, 3]],
#  ('Mark', 'Eden'): [[0, 3, 3, 1], [5, 1, 2, 9]]}

Since itertools.groupby works on consecutive matches, it assumes sorted input (lol).

tokland
  • 66,169
  • 13
  • 144
  • 170
5

If a dictionary is acceptable, this code will create one:

import collections
d = collections.defaultdict(list)
for name, surname, *stuff in lol:
    d[name, surname].append(nums)

Note that this requires Python 3 (extended iterable unpacking). For Python 2, use

for x in lol:
    name = x[0]
    surname = x[1]
    stuff = x[2:]

You may fold the variables to save lines.

0

To complement delnan's answer with a Python 2 equivalent:

from collections import defaultdict

dolol=defaultdict(list)
for data in lol:
    dolol[data[0],data[1]].append(data[2:])
Jonathan Livni
  • 101,334
  • 104
  • 266
  • 359