0

I have some sample data similar to the below.

From: JoeBloggs
Subject: This is a subject line

Hello

What I am looking to do is remove all of the data including the word 'From' until the final newline before 'Hello'. I would therefore need to look for two newline entries in sequence (I think). I could look to run a regex.match for 'From:' and then replace it but that would only replace the 'From:'. Is there anything I can do achieve this?

Using the 'repr' command with print, it displays as below:-

'From: JoeBloggs\nSubject: This is a subject line\n\nHello'

Therefore, I need to run a regex command to find the word 'From: ' all the way until the following '\n\n' and then replace everything in between.

martineau
  • 119,623
  • 25
  • 170
  • 301
thefragileomen
  • 1,537
  • 8
  • 24
  • 40

2 Answers2

0
import re

test = """From: JoeBloggs
Subject: This is a subject line

Hello"""

print(test)
print("-"*10)
match = re.match(r'.*?\n\n(?P<content>.*)', test, re.DOTALL|re.MULTILINE)
print(match.group("content"))

Running this results in:

$ python test.py 
From: JoeBloggs
Subject: This is a subject line

Hello
----------
Hello
  • @thefragileomen and you could start your pattern as `r'From:\s.*?`, in case you wand to find multiple instances in a long string – RichieV Sep 18 '20 at 16:57
0
import re

pattern = r'.+?\n\n'
text = 'From: JoeBloggs\nSubject: This is a subject line\n\nHello'
replacement_text = 'replacement\n'

replaced_text = re.sub(pattern,replacement,text,flags=re.DOTALL)
print(replaced_text)

Output

replacement
Hello
Nadeem Mehraj
  • 174
  • 1
  • 2
  • 15