3

I have a pattern of text that I would like to find and push to a new line. The pattern is ), followed by a space and a character. Like this -

text_orig =

text cat dog cat dog
),
text rabbit cat dog
), text coffee cat dog. #need to indent this line

where it would become

text_new =

text cat dog cat dog
),
text rabbit cat dog
), 
text coffee cat dog

I'm pretty close to a solution, but stuck on what approach to use. Currently, I'm using re.sub but I believe that removes the first letter of the text like so -

text_new =

text cat dog cat dog
),
text rabbit cat dog
), 
ext coffee cat dog # removes first letter
re.sub('\),\s\w','), \n',text_orig)

Would I need search instead of sub? Help is very appreciated

paranormaldist
  • 489
  • 5
  • 16
  • 1
    You can try `re.sub(r'\),[^\S\n]*(?=\w)', '),\n', text_orig)` ([demo](https://regex101.com/r/tku3D9/1)) or, if it should be at the start of a line, `re.sub(r'^\),[^\S\n]*(?=\w)', '),\n', text_orig, flags=re.M)` – Wiktor Stribiżew Mar 25 '21 at 18:14
  • Indent is what you do when you add a tab. You actually seem to want to simply add a line break where you find that pattern. – Pranav Hosangadi Mar 25 '21 at 18:15
  • @PranavHosangadi ah yes then a line break where the pattern is found – paranormaldist Mar 25 '21 at 18:16
  • 1
    The term you're looking for (that Wiktor's example uses) is called "positive lookahead". https://stackoverflow.com/questions/47886809/python-regex-lookbehind-and-lookahead For example, the regex `ab(?=c)` will match strings that contain `"abc"`, but will not consume `"c"` as part of the match – Pranav Hosangadi Mar 25 '21 at 18:19

1 Answers1

3

You can use

re.sub(r'\),[^\S\n]*(?=\w)', '),\n', text_orig)

See the regex demo.

Or, if the pattern should only match at the start of a line, you should add ^ and the re.M flag:

re.sub(r'^\),[^\S\n]*(?=\w)', '),\n', text_orig, flags=re.M)

Here,

  • ^ - start of a line (with re.M flag)
  • \), - a ), substring
  • [^\S\n]* - zero or more whitespaces other than LF char
  • (?=\w) - a positive lookahead that requires a word char immediately to the right of the current location.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563