3

This should be an really easy task using the re library. However, I can't seem to split my string at the delimiters ] and [.

I already read Splitting a string with multiple delimiters in Python, Python: Split string with multiple delimiters, and Python: How to get multiple elements inside square brackets.

My string:

data = "This is a string spanning over multiple lines.
        At somepoint there will be square brackets.

        [like this]

        And then maybe some more text.

        [And another text in square brackets]"

It should return:

['This is a string spanning over multiple lines.\nAt somepoint there will be square brackets.','like this', 'And then maybe some more text.', 'And another text in square brackets']

A short example to try:

data2 = 'A new string. [with brackets] another line [and a bracket]'

I tried:

re.split(r'(\[|\])', data2)
re.split(r'([|])', data2)

But those would either result in having the delimiter in my resulting list or a wrong list altogether:

['A new string. ', '[', 'with brackets', ']', ' another line ', '[', 'and a bracket', ']', '']

Result should be:

['A new string.', 'with brackets', 'another line', 'and a bracket']

As a special requirement all newline characters and white spaces before and after a delimiter should be removed and not be included in the list either.

Community
  • 1
  • 1
cherrun
  • 2,102
  • 8
  • 34
  • 51

4 Answers4

7
>>> re.split(r'\[|\]', data2)
['A new string. ', 'with brackets', ' another line ', 'and a bracket', '']
arshajii
  • 127,459
  • 24
  • 238
  • 287
5

As arshajii points out, you don't need groups at all for this particular regexp.

If you did need groups to express a more complex regexp, you could use noncapturing groups to split without capturing the delimiter. It's potentially useful for other situations but syntactically messy overkill here.

(?:...)

A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

http://docs.python.org/2/library/re.html

So the unnecessarily complex but demonstrative example here would be:

re.split(r'(?:\[|\])', data2)
Community
  • 1
  • 1
Peter DeGlopper
  • 36,326
  • 7
  • 90
  • 83
2

use this instead (without capture group):

re.split(r'\s*\[|]\s*', data)

or shorter:

re.split(r'\s*[][]\s*', data)
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

Couuld either split or findall all, eg:

data2 = 'A new string. [with brackets] another line [and a bracket]'

Using split and filtering out leading/trailing spaces:

import re
print filter(None, re.split(r'\s*[\[\]]\s*', data2))
# ['A new string.', 'with brackets', 'another line', 'and a bracket']

Or possibly, adapt a findall approach:

print re.findall(r'[^\b\[\]]+', data2)
# ['A new string. ', 'with brackets', ' another line ', 'and a bracket'] # needs a little work on leading/trailing stuff...
Jon Clements
  • 138,671
  • 33
  • 247
  • 280