Split at multiple delimiter without delimiter in the list

Question

This should be an really easy task using the re library. However, I can't seem to split my string at the delimiters ] and [.

I already read Splitting a string with multiple delimiters in Python, Python: Split string with multiple delimiters, and Python: How to get multiple elements inside square brackets.

My string:

data = "This is a string spanning over multiple lines.
        At somepoint there will be square brackets.

        [like this]

        And then maybe some more text.

        [And another text in square brackets]"

It should return:

['This is a string spanning over multiple lines.\nAt somepoint there will be square brackets.','like this', 'And then maybe some more text.', 'And another text in square brackets']

A short example to try:

data2 = 'A new string. [with brackets] another line [and a bracket]'

I tried:

re.split(r'(\[|\])', data2)
re.split(r'([|])', data2)

But those would either result in having the delimiter in my resulting list or a wrong list altogether:

['A new string. ', '[', 'with brackets', ']', ' another line ', '[', 'and a bracket', ']', '']

Result should be:

['A new string.', 'with brackets', 'another line', 'and a bracket']

As a special requirement all newline characters and white spaces before and after a delimiter should be removed and not be included in the list either.

score 7 · Accepted Answer · answered Jun 11 '13 at 16:57

7

>>> re.split(r'\[|\]', data2)
['A new string. ', 'with brackets', ' another line ', 'and a bracket', '']

answered Jun 11 '13 at 16:57

arshajii

127,459
24
238
287

1

Yeah, that's a simpler approach than the non-capturing groups I recommended. – Peter DeGlopper Jun 11 '13 at 17:02
1

Works great. Just as an addition: How would I remove all newline characters und white spaces at the end/beginning of an element? – cherrun Jun 11 '13 at 18:07
Ok. Figured it out. Using `strip()` on each element in the list. Thanks again. – cherrun Jun 11 '13 at 18:11
1

@cherrun How about `re.split(r'\s*[\[\]]\s*', data2)` – arshajii Jun 11 '13 at 18:11

score 5 · Answer 2 · edited Jun 20 '20 at 09:12

As arshajii points out, you don't need groups at all for this particular regexp.

If you did need groups to express a more complex regexp, you could use noncapturing groups to split without capturing the delimiter. It's potentially useful for other situations but syntactically messy overkill here.

(?:...)

A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

http://docs.python.org/2/library/re.html

So the unnecessarily complex but demonstrative example here would be:

re.split(r'(?:\[|\])', data2)

Casimir et Hippolyte · Answer 3 · 2013-06-11T17:06:30.697

2

use this instead (without capture group):

re.split(r'\s*\[|]\s*', data)

or shorter:

re.split(r'\s*[][]\s*', data)

edited Jun 11 '13 at 17:06

answered Jun 11 '13 at 16:57

Casimir et Hippolyte

88,009
5
94
125

score 0 · Answer 4 · answered Jun 11 '13 at 17:00

Couuld either split or findall all, eg:

data2 = 'A new string. [with brackets] another line [and a bracket]'

Using split and filtering out leading/trailing spaces:

import re
print filter(None, re.split(r'\s*[\[\]]\s*', data2))
# ['A new string.', 'with brackets', 'another line', 'and a bracket']

Or possibly, adapt a findall approach:

print re.findall(r'[^\b\[\]]+', data2)
# ['A new string. ', 'with brackets', ' another line ', 'and a bracket'] # needs a little work on leading/trailing stuff...

Split at multiple delimiter without delimiter in the list

4 Answers4