How to split a list into subsets based on a pattern?

Question

I'm doing this but it feels this can be achieved with much less code. It is Python after all. Starting with a list, I split that list into subsets based on a string prefix.

# Splitting a list into subsets
# expected outcome:
# [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]

mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']

def func(l, newlist=[], index=0):
    newlist.append([i for i in l if i.startswith('sub_%s' % index)])
    # create a new list without the items in newlist
    l = [i for i in l if i not in newlist[index]]

    if len(l):
        index += 1
        func(l, newlist, index)

func(mylist)

Related: [Python: split a list based on a condition?](http://stackoverflow.com/q/949098/12892) — Cristian Ciupitu, Oct 04 '14 at 22:48

score 19 · Accepted Answer · edited Oct 04 '14 at 21:50

19

You could use itertools.groupby:

>>> import itertools
>>> mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
>>> for k,v in itertools.groupby(mylist,key=lambda x:x[:5]):
...     print k, list(v)
... 
sub_0 ['sub_0_a', 'sub_0_b']
sub_1 ['sub_1_a', 'sub_1_b']

or exactly as you specified it:

>>> [list(v) for k,v in itertools.groupby(mylist,key=lambda x:x[:5])]
[['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]

Of course, the common caveats apply (Make sure your list is sorted with the same key you're using to group), and you might need a slightly more complicated key function for real world data...

edited Oct 04 '14 at 21:50

Cristian Ciupitu

20,270
7
50
76

answered Nov 13 '12 at 21:04

mgilson

300,191
65
633
696

2

`lambda x:x.split('_')[1]` would be better, considering the items might be like `sub_15_b`. – Ashwini Chaudhary Nov 13 '12 at 21:10
lol, that's really entertaining after the comment got deleted. :) – Brian Cain Nov 13 '12 at 21:10
@AshwiniChaudhary -- Sure I could use a more complicated key function (hence my comment about real world data) -- my goal here was to demonstrate how to use `groupby` in a nice succinct way. Once you start having `for k,v in itertools.groupby(mylist,key=lambda x: x.split('-')[1]):...` it really starts to get confusing what portion of the expression is doing what. I opted to keep it simple so hopefully it will be more understandable and OP can modify as needed. :) – mgilson Nov 13 '12 at 21:14
Yeah! I totally agree with the simplicity part. – Ashwini Chaudhary Nov 13 '12 at 21:23
And understandable it is. Good take on using itertools.groupby. Making note to self not to forget about it in the future. – droidballoon Nov 13 '12 at 21:27

score 2 · Answer 2 · answered Nov 13 '12 at 21:06

In [28]: mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']

In [29]: lis=[]

In [30]: for x in mylist:
    i=x.split("_")[1]
    try:
        lis[int(i)].append(x)
    except:    
        lis.append([])
        lis[-1].append(x)
   ....:         

In [31]: lis
Out[31]: [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]

score 2 · Answer 3 · answered Nov 13 '12 at 21:10

2

Use itertools' groupby:

def get_field_sub(x): return x.split('_')[1]

mylist = sorted(mylist, key=get_field_sub)
[ (x, list(y)) for x, y in groupby(mylist, get_field_sub)]

answered Nov 13 '12 at 21:10

Brian Cain

14,403
3
50
88

1

Good with the sorting added to the example. +1 for the link to ``groupby`` docs :) – droidballoon Nov 13 '12 at 21:30

How to split a list into subsets based on a pattern?

3 Answers3

Linked