Make sub lists from list based on condition

Question

I know there are very similar questions to this, but my specific case is a bit unique. Let's say I have a list, lst, of data:

['1234 56 789',
 '12345',
 'x',
 'y',
 '9876 54 321',
 '54321',
 'x',
 '1234 98 765',
 '12398',
 'x',
 'y']

What I am trying to do is make sublists within this list. My goal is to start a new sublist at every unique identifier (which in this list are the long strings with two spaces). Initially, I realized I could run the following code:

[lst[i:i+4] for i in range(0, len(lst),4)]

But I noticed that not every unique identifier has a y value, so the sublists aren't created correctly, as shown below:

[['1234 56 789', '12345', 'x', 'y'],
 ['9876 54 321', '54321', 'x', '1234 98 765'],
 ['12398', 'x', 'y']]

My desired output is the following. A list of sublists that start at every unique identifier

[['1234 56 789', '12345', 'x', 'y'],
 ['9876 54 321', '54321', 'x'],
 ['1234 98 765','12398', 'x', 'y']]

I realized I could potentially run some loop that checks if a given item is a unique identifier, and if it isn't, put it in a sublist, but if it is, start a new sublist.

I have attempted the following:

lists = [[]] 
 
for i in range(0, len(lst)):
  if (lst[i][0].isdigit()) and (len(lst[i]) > 10): # If i is a unique identifier 
    lists.append([lst[i]]) # Start new sub-list
  else:
    lists[len(lists)-1].append(lst[i]) # Add it to the last sub-list
print(lists)

But I get an error:

IndexError: string index out of range

I feel like I am super close, but I have spent enough time on this to the point where I wanted to ask for help. I have showed my thought process and code. Any help is appreciated. Thanks.

What happened when you tried to [debug the program](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/), starting by reading and trying to understand the error message? Which string and which index do you think it is talking about? What was the value of the index, and what was the value of the string? Does it make sense to you that this is out of range? Now, work backwards from there. How did the wrong value get computed? — Karl Knechtel, Aug 05 '21 at 18:06
Also, please [learn to write your loops normally](https://nedbatchelder.com/text/iter.html). — Karl Knechtel, Aug 05 '21 at 18:07
It works for me as is (but I would change `lists =[[]]` to `lists = []` to remove the resulting empty list at index 0). — Chris Charley, Aug 05 '21 at 18:27

score 0 · Answer 1 · answered Aug 05 '21 at 18:52

When you say like lst[0][0], this cause you that index error. because in lst[0] you are selecting first element of that list. and when you say lst[0][0] it tries to get first element of first element of that list. and that's not what you want

I wrote some code:

output_lists = []
for code_num in lst:
    if len(code_num) > 10:
        output_list.append([code_num])
    else:
        output_list[-1].append(code_num)

Notice that .isdigit() condition is wrong here. because for example '1234 56 789' has whitespace and isn't digit

When `i` == 0, `lst[i][0].isdigit()` tests for the *first* char in that string, not the entire string. (which contains spaces) — Chris Charley, Aug 05 '21 at 18:56

score 0 · Answer 2 · answered Aug 05 '21 at 19:13

You can write a regex for your separator and then adapt this solution proposed to the question Make Python Sublists from a list using a Separator.

The samples you gave for this separator fit the regex "[0-9]+\s[0-9]+\s[0-9]+". You can test this pattern on your data here on Pythex.

import re
import itertools

a = ['1234 56 789', '12345', 'x', 'y', '9876 54 321', 
'54321', 'x', '1234 98 765', '12398', 'x', 'y']
pattern = re.compile(r"[0-9]+\s[0-9]+\s[0-9]+")
splits = [list(x[1]) for x in itertools.groupby(a, lambda x: pattern.search(x))]
output = [ splits[i]+splits[i+1] for i in range(0, len(splits), 2)]
print(output)

This will give:

[['1234 56 789', '12345', 'x', 'y'], ['9876 54 321', '54321', 'x'], ['1234 98 765', '12398', 'x', 'y']]

The line output = [ splits[i:i+1] for i in range(0, len(splits), 2)] is needed because you want every sublist to start with your separator, instead of having each separator on an individual list.

score 0 · Answer 3 · answered Aug 05 '21 at 19:28

The following seems like a straightforward way of doing it — it just watches for strings with spaces in them in order to detect the beginning of each group.

from pprint import pprint

lst = ['1234 56 789',
       '12345',
       'x',
       'y',

       '9876 54 321',
       '54321',
       'x',

       '1234 98 765',
       '12398',
       'x',
       'y']


lists = []
group = None
for elem in lst:
    if len(elem.split()) > 1:
        if group:
            lists.append(group)
        group = [elem]
    else:
        group.append(elem)

if group:
    lists.append(group)

pprint(lists)

Output:

[['1234 56 789', '12345', 'x', 'y'],
 ['9876 54 321', '54321', 'x'],
 ['1234 98 765', '12398', 'x', 'y']]

Make sub lists from list based on condition

3 Answers3