1

I would like to remove all the line codes in brackets placed at the beginning of lines but want to preserve the other words in brackets.

NOTE: In the application that I use I cannot import any Python library but can use Python regexes. The regex and the replacement value in the substitution have to be separated by a comma. For example, I use ([^\s\d])(-\s+),\1 to merge hyphenated words at the end of lines. So I would need something similar.

\([^()]*\) finds every text in brackets.

^\h*\([^()]*\) finds only the first one but not the rest. How should I modify it?

The sample text is the following:

(#p0340r#) This is a sentence. This is another one but I need more sentences to fill the space to start a new line.
(#p0350q#) Why? (this text should be left unchanged)
(#p0360r#) Because I need to remove these codes from interview texts.

The expected outcome should be:

This is a sentence. This is another one but I need more sentences 
to fill the space to start a new line.
Why? (this text should be left unchanged)
Because I need to remove these codes from interview texts.

Thank you!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Please format the sample input text as _code_, by prepending 4 or more spaces to each line. It would also be helpful to see the expected output you want. – Tim Biegeleisen Nov 30 '21 at 09:41
  • Does this answer your question? [Using ^ to match beginning of line in Python regex](https://stackoverflow.com/questions/31400362/using-to-match-beginning-of-line-in-python-regex) – bobble bubble Nov 30 '21 at 10:20
  • Well, it turns out it is any Python code related question. – Wiktor Stribiżew Nov 30 '21 at 10:23

1 Answers1

1

To remove a pattern at the start of any line with Python re.sub (or any re.sub powered search and replace), you need to use the ^ before the pattern (that is what you already have) and pass the multiline (?m) flag (if you have access to code you could use flags=re.M).

Also, \h is not Python re compliant, you need to use a construct like [ \t] or [^\S\n] (in some rare cases, also [^\S\r\n], usually when you read a file in binary mode) to match any horizontal whitespace.

So you can use

(?m)^[^\S\n]*\([^()]*\)[^\S\n]*

and replace with an empty string.

Note: if you ever want to remove one or more substrings inside parentheses at the start of a line group the pattern and apply the + quantifier on it:

(?m)^(?:[^\S\n]*\([^()]*\))+[^\S\n]*
#    ^^^                  ^^
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you, Viktor. The above codes are perfect. But in the application that I use I cannot import any Python library but can use Python regexes. The regex and the replacement value in the substitution have to be separated by a comma. For example, I use ([^\s\d])(-\s+),\1 to merge hyphenated words at the end of lines. So I would need something similar. – learner2021 Nov 30 '21 at 10:16
  • @learner2021 As I wrote, `re.M` can be used as an inline modifier `(?m)`. So, the pattern is `(?m)^[^\S\n]*\([^()]*\)[^\S\n]*`, the replacement is just an empty string. Sorry, it is a site where programming issues are resolved, it is not a 3rd party support forum. – Wiktor Stribiżew Nov 30 '21 at 10:20
  • 1
    Superb! Cheers!!!! – learner2021 Nov 30 '21 at 10:23
  • @learner2021 If you still have trouble with this issue, please update the question. – Wiktor Stribiżew Dec 01 '21 at 08:41