1

I would like to extract the substring from the string, such as

Case 1:

text = "some_txt" # → some_txt

Case2:

text = "[info1]some_txt" #  → some_txt

Case3:

text = "[info1][info2] some_text" # → some_txt

Case4:

text = "[info1][info2] some_text_with_[___]_abc" # → some_text_with_[___]_abc

What I did was

match = re.search("^\[.+\] (.*)", text)
   if match:
   result = match.group(1)

It works okay except case 4, which gives abc only. I want to get some_text_with_[___]_abc instead.

Any help will be greatly appreciated.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Sean Oh
  • 31
  • 3

1 Answers1

1

With your current code, you can use

r"^(?:\[[^][]+](?:\s*\[[^][]+])*)?\s*(.*)"

See the regex demo.

If you are not actually interested in whether there is a match or not, you may use re.sub to remove these bracketed substrings from the start of the string using

re.sub(r'^\[[^][]+](?:\s*\[[^][]+])*\s*', '', text)

See another regex demo.

Regex details

  • ^ - start of string
  • (?:\[[^][]+](?:\s*\[[^][]+])*)? - an optional occurrence of
    • \[[^][]+] - a [, then any one or more chars other than [ and ] as many as possible and then a ]
    • (?:\s*\[[^][]+])* - zero or more occurrences of zero or more whitespaces and then a [, then any one or more chars other than [ and ] as many as possible and then a ]
  • \s* - zero or more whitespaces
  • (.*) - Group 1: any zero or more chars other than line break chars, as many as possible.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563