How to split based off two characters "[" and "]" in a string

Question

For example calling .split() on the following would give...

x = "[Chorus: Rihanna & Swizz Beatz]
I just wanted you to know
...more lyrics
[Verse 2: Kanye West & Swizz Beatz]
I be Puerto Rican day parade floatin'
... more lyrics"

x.split()
print(x)

would give

["I just wanted you to know ... more lyrics", " be Puerto Rican day parade floatin' ... more lyrics]

Also, how would you save the deleted parts in brackets, thank you. Splitting by an unknown string inside two things is hard :/

Look at [`re.split`](https://docs.python.org/3/library/re.html#re.split) — c2huc2hu, May 18 '18 at 01:47
How is this different than your [previous question](https://stackoverflow.com/questions/50327590/python-split-based-off-a-string-between-two-characters)? Also what is your desired output for this sample text? — pault, May 18 '18 at 01:49
`x.split()` doesn't produce the list you claim and you don't mention what you _do_ want. This question is unanswerable as stands. Can you turn your code into a working example and then include the desired result? Otherwise, we need to close this. — tdelaney, May 18 '18 at 02:36

score 2 · Answer 1 · answered May 18 '18 at 01:58

Use re.split

>>> x = """[Chorus: Rihanna & Swizz Beatz] I just wanted you to know...more lyrics [Verse 2: Kanye West & Swizz Beatz] I be Puerto Rican day parade floatin' ... more lyrics"""
>>> [i.strip() for i in re.split(r'[\[\]]', x) if i]

# ['Chorus: Rihanna & Swizz Beatz', 'I just wanted you to know...more lyrics', 'Verse 2: Kanye West & Swizz Beatz', "I be Puerto Rican day parade floatin' ... more lyrics"]

Nitro · Answer 2 · 2018-05-18T01:55:05.710

0

data=x.split(']')
print(data)
data=data[1::]
print(data)
location=0;
for i in data:
    data[location]=i.split('[')[0]
    location=location+1;
print(data)

I got this output for your initial input

['I just wanted you to know...more lyrics', "I be Puerto Rican day parade floatin'... more lyrics"]

I hope this helps

edited May 18 '18 at 01:55

answered May 18 '18 at 01:54

Nitro

1,063
1
7
17

AmphotericLewisAcid · Answer 3 · 2018-05-18T02:06:39.773

0

Per the python documentation: https://docs.python.org/2/library/re.html

Python is by and large an excellent language with good consistency, but there are still some quirks to the language that should be ironed out. You would think that the re.split() function would just have a potential argument to decide whether the delimiter is returned. It turns out that, for whatever reason, whether it returns the delimiter or not is based on the input. If you surround your regex with parentheses in re.split(), Python will return the delimiter as part of the array.

Here are two ways you might try to accomplish your goal:

re.split("]",string_here)

and

re.split("(])",string_here)

The first way will return the string with your delimiter removed. The second way will return the string with your delimiter still there, as a separate entry.

For example, running the first example on the string "This is ] a string" would produce:

["This is a ", " string."]

And running the second example would produce:

["This is a ", "]", " string."]

Personally, I'm not sure why they made this strange design choice.

edited May 18 '18 at 02:06

answered May 18 '18 at 02:01

AmphotericLewisAcid

1,824
9
26

_You would think that the re.split() function would just have a potential argument to decide whether the delimiter is returned._ not really... the regex can have multiple groups and they would all be returned. Its not just a question of a single delimiter. – tdelaney May 18 '18 at 02:41
Even in the case of multiple delimiters, it's splitting the string based on a pattern. Therefore, it must know what subset of the string was matched to the pattern. – AmphotericLewisAcid May 18 '18 at 17:26
It knows what subset was matched as the delimiter, but the question is, what part of that delimiter should be returned? The rule is simple: all of the capture groups. If I split on `r"\s+"`, there are no capture groups and no delimiter is returned. But what about `s = "aaa [1, 2] bbb [3, 4] ccc"`? `re.split(r"\s*\[(\d+)\s*,\s*(\d+)\s*\]\s*", s)` returns `['aaa', '1', '2', 'bbb', '3', '4', 'ccc']`. It would be more complicated to have a parameter outside of the regex telling you which capture groups to use. – tdelaney May 18 '18 at 17:54

we_create · Answer 4 · 2018-05-18T02:23:54.920

import re
...
input='[youwontseethis]what[hi]ever'
...
output=re.split('\[.*?\]',input)
print(output)

#['','what','ever']

If the input string starts immediately with a 'tag' like your example, the first item in the tuple will be an empty string. If you don't want this functionality you could also do this:

import re
...
input='[youwontseethis]what[hi]ever'
...
output=re.split('\[.*?\]',input)
output=output[1:] if output[0] == '' else output
print(output)

#['what',ever']

To get the tags simply replace the

output=re.split('\[.*?\]',input)

with

output=re.findall('\[.*?\]',input)

#['[youwontseethis]','[hi]']

How to split based off two characters "[" and "]" in a string

4 Answers4