11

I'm doing this but it feels this can be achieved with much less code. It is Python after all. Starting with a list, I split that list into subsets based on a string prefix.

# Splitting a list into subsets
# expected outcome:
# [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]

mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']

def func(l, newlist=[], index=0):
    newlist.append([i for i in l if i.startswith('sub_%s' % index)])
    # create a new list without the items in newlist
    l = [i for i in l if i not in newlist[index]]

    if len(l):
        index += 1
        func(l, newlist, index)

func(mylist)
Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
droidballoon
  • 720
  • 10
  • 17

3 Answers3

19

You could use itertools.groupby:

>>> import itertools
>>> mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
>>> for k,v in itertools.groupby(mylist,key=lambda x:x[:5]):
...     print k, list(v)
... 
sub_0 ['sub_0_a', 'sub_0_b']
sub_1 ['sub_1_a', 'sub_1_b']

or exactly as you specified it:

>>> [list(v) for k,v in itertools.groupby(mylist,key=lambda x:x[:5])]
[['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]

Of course, the common caveats apply (Make sure your list is sorted with the same key you're using to group), and you might need a slightly more complicated key function for real world data...

Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • 2
    `lambda x:x.split('_')[1]` would be better, considering the items might be like `sub_15_b`. – Ashwini Chaudhary Nov 13 '12 at 21:10
  • lol, that's really entertaining after the comment got deleted. :) – Brian Cain Nov 13 '12 at 21:10
  • @AshwiniChaudhary -- Sure I could use a more complicated key function (hence my comment about real world data) -- my goal here was to demonstrate how to use `groupby` in a nice succinct way. Once you start having `for k,v in itertools.groupby(mylist,key=lambda x: x.split('-')[1]):...` it really starts to get confusing what portion of the expression is doing what. I opted to keep it simple so hopefully it will be more understandable and OP can modify as needed. :) – mgilson Nov 13 '12 at 21:14
  • Yeah! I totally agree with the simplicity part. – Ashwini Chaudhary Nov 13 '12 at 21:23
  • And understandable it is. Good take on using itertools.groupby. Making note to self not to forget about it in the future. – droidballoon Nov 13 '12 at 21:27
2
In [28]: mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']

In [29]: lis=[]

In [30]: for x in mylist:
    i=x.split("_")[1]
    try:
        lis[int(i)].append(x)
    except:    
        lis.append([])
        lis[-1].append(x)
   ....:         

In [31]: lis
Out[31]: [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
2

Use itertools' groupby:

def get_field_sub(x): return x.split('_')[1]

mylist = sorted(mylist, key=get_field_sub)
[ (x, list(y)) for x, y in groupby(mylist, get_field_sub)]
Brian Cain
  • 14,403
  • 3
  • 50
  • 88