Get individual strings between brackets

Question

Let's say I have this string

[LEVEL]
    [NAME]The Girder Guide! [/NAME]
    [AUTHOR]draworigami[/AUTHOR]
    [AUTHORLEVEL]11[/AUTHORLEVEL]
    [COUNTRY]CA[/COUNTRY]
    [ID]62784[/ID]
    [RATING]4[/RATING]
    [DATE]2021-05-11 23:08:35[/DATE]
    [PLAYCOUNT]33[/PLAYCOUNT]
    [WINCOUNT]28[/WINCOUNT]
    [STARS]0[/STARS]
    [COMMENTS]1[/COMMENTS]
[/LEVEL]

Is there a way I can get the individual strings between each [] and [/]? I've kept taking shots at it with code from the internet to no avail.

welcome to stackoverflow! please take the [tour](http://stackoverflow.com/tour), read up on [how to ask a question](https://stackoverflow.com/help/asking) and provide the [shortest program necessary to reproduce the problem](https://stackoverflow.com/help/minimal-reproducible-example). why the `rml` tag? — hiro protagonist, May 14 '21 at 12:11
This looks like an XML-like recursive language, so you could parse it with a recursive-descent, LL(k) or LR(k) parser. Regexes won't work because they aren't powerful enough for this kind of language. — ForceBru, May 14 '21 at 12:15
@hiroprotagonist It is in RDF Mapping Language (RML) formatting. — Snackers, May 14 '21 at 12:28
[RML Mapping Language](https://rml.io/docs/rml/introduction/) looks different from what you included in your question. In the spec the square brackets are used to wrap recursive content, while in your format they are used to identify tags. Seems completely different. Please provide a reference for the format you are using, including specs on how certain characters are escaped. — trincot, May 14 '21 at 12:32
@Snackers it really does not look like [this rml](https://rml.io/docs/rml/introduction/)... — hiro protagonist, May 14 '21 at 12:38

Ashok Arora · Accepted Answer · 2021-05-14T12:48:09.143

This will return all the text between [] and [/]:

from bs4 import BeautifulSoup

rml = """
[LEVEL]
    [NAME]The Girder Guide! [/NAME]
    [AUTHOR]draworigami[/AUTHOR]
    [AUTHORLEVEL]11[/AUTHORLEVEL]
    [COUNTRY]CA[/COUNTRY]
    [ID]62784[/ID]
    [RATING]4[/RATING]
    [DATE]2021-05-11 23:08:35[/DATE]
    [PLAYCOUNT]33[/PLAYCOUNT]
    [WINCOUNT]28[/WINCOUNT]
    [STARS]0[/STARS]
    [COMMENTS]1[/COMMENTS]
[/LEVEL]
"""

html = rml.replace('[', '<').replace(']', '>')
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('level').text)

Output:

The Girder Guide! 
draworigami
11
CA
62784
4
2021-05-11 23:08:35
33
28
0
1

Edit #1: The original string does not have newlines, so to pretty print:

rml = "[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]"

html = rml.replace('[', '<').replace(']', '>')
soup = BeautifulSoup(html, 'html.parser')
elements = soup.find('level').contents
for e in elements:
    print(e.text)

The string doesn't have newlines by default. Would it be stupid of me to ask you how to pretty print it? — Snackers, May 14 '21 at 12:40

score 0 · Answer 2 · answered May 14 '21 at 12:18

Try this:

st = "[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]"

st = st.split("]")
for i in range(len(st)):
    st[i] = st[i].replace("[", "")
    st[i]= st[i].replace("/", "")

st = st[:-1]

print(st)

The st becomes-

['LEVEL', 'NAME', 'The Girder Guide!NAME', 'AUTHOR', 'draworigamiAUTHOR', 'AUTHORLEVEL', '11AUTHORLEVEL', 'COUNTRY', 'CACOUNTRY', 'ID', '62784ID', 'RATING', '4RATING', 'DATE', '2021-05-11 23:08:35DATE', 'PLAYCOUNT', '33PLAYCOUNT', 'WINCOUNT', '28WINCOUNT', 'STARS', '0STARS', 'COMMENTS', '1COMMENTS', 'LEVEL']

What I did:

split the string around ] so a list of strings is obtained without the character ']'.
simply removed the characters [ and / individually from the strings in the list obtained.
skipped the last character because it was an empty string generated due to split.

score -1 · Answer 3 · 2021-05-14T12:53:17.907

How about using regular expression?

import re
s = '[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]'
s = s.replace('/', '')
result = []
for e in re.findall(r"\][A-Za-z0-9 _.:,!'/$\-]+\[", s):
    result.append(e.replace('[', '').replace(']', ''))

result

['The Girder Guide!',
 'draworigami',
 '11',
 'CA',
 '62784',
 '4',
 '2021-05-11 23:08:35',
 '33',
 '28',
 '0',
 '1']

Get individual strings between brackets

3 Answers3