11

I have this list (python):

[[item1],[item2],[item3],[/],[item4],[item5],[item6],[/]...]

I want to separate these into chunks and the elements that will go into each chunk are the elements before the separator "/".

So my chunks would look like:

chunk1 = [[item1],[item2],[item3]]
chunk2 = [[item4],[item5],[item6]]

I've tried and tried, but nothing efficient came to mind. Tried looping through it with a for and and if element[x] == '/' then get some positions. It's very dirty and doesn't properly work.

Any help would be appreciated.

funnydman
  • 9,083
  • 4
  • 40
  • 55
Benjamin
  • 315
  • 2
  • 14

4 Answers4

17

The usual approach for collecting contiguous chunks is to use itertools.groupby, for example:

>>> from itertools import groupby
>>> blist = ['item1', 'item2', 'item3', '/', 'item4', 'item5', 'item6', '/']
>>> chunks = (list(g) for k,g in groupby(blist, key=lambda x: x != '/') if k)
>>> for chunk in chunks:
...     print(chunk)
...     
['item1', 'item2', 'item3']
['item4', 'item5', 'item6']

(Your representation of your list [item1],[item2],[item3],[/], makes it look like each of your elements in the list is actually a list, in which case the same approach will work, you simply need to compare against ['/'] or whatever your separator is.)

DSM
  • 342,061
  • 65
  • 592
  • 494
  • Would you blame me if I said I really wanted to understand the thought process behind this? I knew about groupby and googled on it, but I don't understand the underlying process. Also, no, [item1] is only one string. – Benjamin Jun 14 '15 at 02:57
  • It works - but, for some reason, I can't select a whole chunk on its own. If I do 'print chunk[0]', obviously, it'll print the first element of each chunk. But how do I select only an individual chunk? chunks[0] won't cut it. It works if I turn the chunks into lists, but is this the only way? – Benjamin Jun 14 '15 at 03:05
  • @Benjamin: if you want to materialize a list, you could use a list comprehension instead of a generator expresssion: IOW, write `[list(g) for k,g in groupby(blist, key=lambda x: x != '/') if k]` instead (with square brackets instead of parentheses). This will make `chunks` a list, and then you can access `chunks[0]`. Right now `chunks` is merely an iterable, not a list, so you can loop over it but you can't select individual elements (because they don't even exist yet.) – DSM Jun 14 '15 at 03:26
  • Great. It works awesomely. Do you think this is recommended (speed wise) for large lists? – Benjamin Jun 14 '15 at 04:45
4

I wrote something simpler for you to understand - Basically look out for '/', if it's not there keep appending to chunks. itertools.groupby would be worth learning, but something simpler that one understands first is a good idea to start with.

l = ['i1', 'i2', 'i3', '/', 'i4', 'i5', 'i6', '/']

chunks = []
x = 0
chunks.append([])   # create an empty chunk to which we'd append in the loop
for i in l:
    if i != '/':
        chunks[x].append(i)
    else:
        x += 1
        chunks.append([])

print chunks

If your elements are strings, there's a faster way to do what I have done in python - basically - first create a ' ' (space) separated string and then, first split by '/' and then by ' ' again.

l = ['i1', 'i2', 'i3', '/', 'i4', 'i5', 'i6', '/']

s = " ".join(l)  # first create a string, joining by a <space> it could be anything

chunks2 = [x.split() for x in s.split("/")]
print chunks2
gabhijit
  • 3,345
  • 2
  • 23
  • 36
  • This works flawlessly and it's cleaner / works better. I wonder though, is this better than @DSM's solution for bigger lists? – Benjamin Jun 14 '15 at 04:50
  • 1
    No it is not!! As I said - it's a good idea to have something that one understands. So it's only a solution to help understand how to go about it, `itertools.groupby` will be a preferred way, just that - I often found understanding `itertools` hard as a beginner. So certainly yes preferred solution is @DSM's – gabhijit Jun 14 '15 at 05:01
2

This can also be done as (assuming empty chunks are not desired and l is the list to be "chunked"):

chunks, last_chunk = [], []
for x in l:
    if x == '/':
         if last_chunk:
             chunks.append(last_chunk)
             last_chunk = []
    else:
         last_chunk.append(x)
if last_chunk:
    chunks.append(last_chunk)
dcg
  • 4,187
  • 1
  • 18
  • 32
0

Less flexible than the groupby-solutions, but in case someone is anyway going to use Numpy arrays, and there is only one (or a fixed small number) of delimiters:

i, = np.where(array_of_str=='/')[0]
bulk1, bulk2 = array_of_str[:i], array_of_str[i+1:]
sigvaldm
  • 564
  • 4
  • 15