Split a list into other sublists, splitting will be based on a space defined in the main list

Question

Let say I have this list:

list1 = ["I", "am", "happy", " ", "and", "fine", " ", "and", "good"]

I want to end up with:

sublist1 = ["I", "am", "happy"]
sublist2 = ["and", "fine"]
sublist3 = ["and", "good"]

So, I want to split the list into sub-lists based on the space that in there in list1.

Will there always be exactly 2 spaces in the list? If no, do you want to dynamically create the variables `sublist4`, `sublist5`, etc? (Please don't.) Also, have you tried to solve this problem on your own? It basically comes down to a loop, an `if` and an `append` call. — Aran-Fey, Sep 16 '17 at 09:30
There is certainly something to pick form [`itertools`](https://docs.python.org/3/library/itertools.html) module… — Laurent LAPORTE, Sep 16 '17 at 09:34
No, it is not exactly 2 spaces, as a matter of fact it something exactly like:['Kai', 'Boulder', 'Broadway', ' ', 'john', ' ', 'kabel', ' ', 'Cynthia', 'Creative', ' ', 'doc','dee','missy','great', ' ','mimmy',' ') — Dee.Es, Sep 16 '17 at 09:42
@Dee.A are you getting the original list from attempting to tokenize text from somewhere? If so - it looks like you should probably address it there rather than post-process it... — Jon Clements, Sep 16 '17 at 09:56
@JonClements: No it is not the case. It is apart of a project assignment — Dee.Es, Sep 16 '17 at 10:13

score 8 · Answer 1 · answered Sep 16 '17 at 09:41

8

itertools.groupby is the perfect weapon for this, using the str.isspace property to separate the groups, and filtering out the groups with space.

import itertools

list1 = ["I", "am", "happy", " ", "and", "fine", " ", "and", "good"]

result = [list(v) for k,v in itertools.groupby(list1,key=str.isspace) if not k]


print(result)

result:

[['I', 'am', 'happy'], ['and', 'fine'], ['and', 'good']]

if you know there are 3 variables (which is not very wise) you could unpack

sublist1,sublist2,sublist3 = result

but it's better to keep the result as a list of lists.

answered Sep 16 '17 at 09:41

Jean-François Fabre

137,073
23
153
219

It is very useful, is there something relevant to str.isspace but for the new line, i.e. instead of the space on the list it will be "\n"? – Dee.Es Sep 16 '17 at 10:16
@Dee.A `str.isspace` considers `\n` to be true. Or are you saying that the grouping should be by an explicit newline character only and not strings that are all space? – Jon Clements Sep 16 '17 at 10:23
@JonClements first of all thank you, you are very helpful, second, what I mean, the list will be something like: ['Kai', 'Boulder', 'Broadway', '\n ', 'john', ' \n', 'kabel', ' \n', 'Cynthia', 'Creative', '\n ', 'doc','dee','missy','great', ' \n','mimmy','\n '] So instead of the space, it willbe new line \n – Dee.Es Sep 16 '17 at 10:29
@Dee.A in that case you're better off using `''.join(your_list).splitlines()`... but using the groupby you'd change `key=str.isspace` to `key=lambda L: L == '\n'` but the code provided in this answer will work with newlines just fine... you only need to change it if you want strings consisting only of spaces to be valid within each group and that each group is delimited by exactly a newline character itself. – Jon Clements Sep 16 '17 at 10:31

score 4 · Answer 2 · answered Sep 16 '17 at 09:43

4

You could do this using a for loop, putting the resulting sublists in a dictionary (as opposed to creating variables on the fly):

lst = ["I", "am", "happy", " ", "and", "fine", " ", "and", "good"]

count = 1
dct = {}
for x in lst:
    if x.isspace():
        count += 1
        continue
    dct.setdefault('sublist{}'.format(count), []).append(x)

print(dct)
# {'sublist2': ['and', 'fine'], 
#  'sublist3': ['and', 'good'], 
#  'sublist1': ['I', 'am', 'happy']}

answered Sep 16 '17 at 09:43

Moses Koledoye

77,341
8
133
139

it is very good, but what if the space turned to new line "\n", I have tried to search about defined syntax like isspace but I did not find something like that. – Dee.Es Sep 16 '17 at 10:24
@Dee.A `\n` is whitespace. It is also captured by `str.isspace`. If you want to capture a new line explicitly, you can do `if x == '\n': ...` – Moses Koledoye Sep 16 '17 at 10:26

score 4 · Answer 3 · answered Sep 16 '17 at 09:44

Well, you can use itertools module to group items according the fact they are space or not.

For instance, you can use str.ispace function as a predicate to group the items:

list1 = ["I", "am", "happy", " ", "and", "fine", " ", "and", "good"]

for key, group in itertools.groupby(list1, key=str.isspace):
    print(key, list(group))

You get:

False ['I', 'am', 'happy']
True [' ']
False ['and', 'fine']
True [' ']
False ['and', 'good']

Based on that, you can construct a list by excluding the groups which key is True (isspace returned True):

result = [list(group)
          for key, group in itertools.groupby(list1, key=str.isspace)
          if not key]
print(result)

You get this list of lists:

[['I', 'am', 'happy'], ['and', 'fine'], ['and', 'good']]

If you are not familiar with comprehension lists, you can use a loop:

result = []
for key, group in itertools.groupby(list1, key=str.isspace):
    if not key:
        result.append(list(group))

You can unpack this result to 3 variables:

sublist1, sublist2, sublist3 = result

score 2 · Answer 4 · answered Sep 16 '17 at 10:25

is there something relevant to str.isspace but for the new line, i.e. instead of the space on the list it will be "\n"?

str.join + re.split() solution on extended example:

import re
list1 = ["I", "am", "happy", " ", "and", "fine", "\n", "and", "good"]
result = [i.split(',') for i in re.split(r',?\s+,?', ','.join(list1))]

print(result)

The output:

[['I', 'am', 'happy'], ['and', 'fine'], ['and', 'good']]

score 1 · Answer 5 · edited Sep 20 '17 at 13:57

Simple answer to your problem:

list1 = ["I", "am", "happy", " ", "and", "fine", " ", "and", "good"]

new_list = []

final_list = []

list1.append(" ") # append an empty str at the end to avoid the other condn

for line in list1:

    if (line != " "):
        new_list.append(line)      # add the element to each of your chunk   
    else: 
        final_list.append(new_list)   # append chunk
        new_list = []       # reset chunk


sublist1,sublist2, sublist3  = final_list

print sublist1,sublist2, sublist3

score 0 · Answer 6 · answered Sep 16 '17 at 10:19

Just for fun. If you know that the words don't have any space, you can pick a special character (e.g. '&') to join and split your strings:

>>> l = ["I", "am", "happy", " ", "and", "fine", " ", "and", "good"]
>>> '&'.join(l)
'I&am&happy& &and&fine& &and&good'
>>> '&'.join(l).split(' ')
['I&am&happy&', '&and&fine&', '&and&good']
>>> [[w for w in s.split('&') if w] for s in '&'.join(l).split(' ')]
[['I', 'am', 'happy'], ['and', 'fine'], ['and', 'good']]

If you want the most reliable solution, pick the groupby one.

Split a list into other sublists, splitting will be based on a space defined in the main list

6 Answers6