Extracting the text after the initial substrings between square brackets

Question

I would like to extract the substring from the string, such as

Case 1:

text = "some_txt" # → some_txt

Case2:

text = "[info1]some_txt" #  → some_txt

Case3:

text = "[info1][info2] some_text" # → some_txt

Case4:

text = "[info1][info2] some_text_with_[___]_abc" # → some_text_with_[___]_abc

What I did was

match = re.search("^\[.+\] (.*)", text)
   if match:
   result = match.group(1)

It works okay except case 4, which gives abc only. I want to get some_text_with_[___]_abc instead.

Any help will be greatly appreciated.

It looks like you want `r"^(?:\[[^][]+])+\s*(.*)"`, right? Or, just `re.sub(r'^(?:\[[^][]+])+\s*', '', text)`. Note it is not a good idea to use builtins as variable names. Please clarify what your requirements are since "I am stuck with python regular expression" is not quite helpful. — Wiktor Stribiżew, Dec 11 '20 at 12:12
Yeah. It works. Thanks alot for the help. But it is quite hard for me to understand. is it possible to explain this regex pattern? — Sean Oh, Dec 11 '20 at 12:15
Your edits are very good. Thanks a lot. I should have done it myself. :D — Sean Oh, Dec 11 '20 at 12:33
That is, if there can be whitespace between substrings in brackets. — Wiktor Stribiżew, Dec 11 '20 at 12:41
Note it is not a good idea to quantify groups having a single obligatory and all other optional patterns, that leads to performance issues and catastrophic backtracking. — Wiktor Stribiżew, Dec 11 '20 at 12:46

score 1 · Answer 1 · answered Dec 11 '20 at 12:40

With your current code, you can use

r"^(?:\[[^][]+](?:\s*\[[^][]+])*)?\s*(.*)"

See the regex demo.

If you are not actually interested in whether there is a match or not, you may use re.sub to remove these bracketed substrings from the start of the string using

re.sub(r'^\[[^][]+](?:\s*\[[^][]+])*\s*', '', text)

See another regex demo.

Regex details

^ - start of string
(?:\[[^][]+](?:\s*\[[^][]+])*)? - an optional occurrence of
- \[[^][]+] - a [, then any one or more chars other than [ and ] as many as possible and then a ]
- (?:\s*\[[^][]+])* - zero or more occurrences of zero or more whitespaces and then a [, then any one or more chars other than [ and ] as many as possible and then a ]
\s* - zero or more whitespaces
(.*) - Group 1: any zero or more chars other than line break chars, as many as possible.

Extracting the text after the initial substrings between square brackets

1 Answers1