How do I remove texts in brackets only from the beginning of lines with regex in Python?

Question

I would like to remove all the line codes in brackets placed at the beginning of lines but want to preserve the other words in brackets.

NOTE: In the application that I use I cannot import any Python library but can use Python regexes. The regex and the replacement value in the substitution have to be separated by a comma. For example, I use ([^\s\d])(-\s+),\1 to merge hyphenated words at the end of lines. So I would need something similar.

\([^()]*\) finds every text in brackets.

^\h*\([^()]*\) finds only the first one but not the rest. How should I modify it?

The sample text is the following:

(#p0340r#) This is a sentence. This is another one but I need more sentences to fill the space to start a new line.
(#p0350q#) Why? (this text should be left unchanged)
(#p0360r#) Because I need to remove these codes from interview texts.

The expected outcome should be:

This is a sentence. This is another one but I need more sentences 
to fill the space to start a new line.
Why? (this text should be left unchanged)
Because I need to remove these codes from interview texts.

Thank you!

Please format the sample input text as _code_, by prepending 4 or more spaces to each line. It would also be helpful to see the expected output you want. — Tim Biegeleisen, Nov 30 '21 at 09:41
Does this answer your question? [Using ^ to match beginning of line in Python regex](https://stackoverflow.com/questions/31400362/using-to-match-beginning-of-line-in-python-regex) — bobble bubble, Nov 30 '21 at 10:20

Wiktor Stribiżew · Answer 1 · 2021-11-30T10:25:54.850

1

To remove a pattern at the start of any line with Python re.sub (or any re.sub powered search and replace), you need to use the ^ before the pattern (that is what you already have) and pass the multiline (?m) flag (if you have access to code you could use flags=re.M).

Also, \h is not Python re compliant, you need to use a construct like [ \t] or [^\S\n] (in some rare cases, also [^\S\r\n], usually when you read a file in binary mode) to match any horizontal whitespace.

So you can use

(?m)^[^\S\n]*\([^()]*\)[^\S\n]*

and replace with an empty string.

Note: if you ever want to remove one or more substrings inside parentheses at the start of a line group the pattern and apply the + quantifier on it:

(?m)^(?:[^\S\n]*\([^()]*\))+[^\S\n]*
#    ^^^                  ^^

edited Nov 30 '21 at 10:25

answered Nov 30 '21 at 10:04

Wiktor Stribiżew

607,720
39
448
563

Thank you, Viktor. The above codes are perfect. But in the application that I use I cannot import any Python library but can use Python regexes. The regex and the replacement value in the substitution have to be separated by a comma. For example, I use ([^\s\d])(-\s+),\1 to merge hyphenated words at the end of lines. So I would need something similar. – learner2021 Nov 30 '21 at 10:16
@learner2021 As I wrote, `re.M` can be used as an inline modifier `(?m)`. So, the pattern is `(?m)^[^\S\n]*\([^()]*\)[^\S\n]*`, the replacement is just an empty string. Sorry, it is a site where programming issues are resolved, it is not a 3rd party support forum. – Wiktor Stribiżew Nov 30 '21 at 10:20
1

Superb! Cheers!!!! – learner2021 Nov 30 '21 at 10:23
@learner2021 If you still have trouble with this issue, please update the question. – Wiktor Stribiżew Dec 01 '21 at 08:41

How do I remove texts in brackets only from the beginning of lines with regex in Python?

1 Answers1