I am trying to sort out specific paragraph by using regular expression in python.
here is an input.txt file.
some random texts (100+ lines)
bbb
...
ttt
some random texts
ccc
...
fff
paragraph_a A_story(
...
some random texts adfsasdsd
...
)
paragraph_b different_story(
...
some random texts
...
)
expected output is here:
some random texts (100+ lines)
bbb
...
ttt
some random texts
ccc
...
fff
paragraph_b different_story(
...
some random texts
...
)
What I want to do is to delete all the paragraph_a contents (including parenthesis) but It should be deleted by the name of the below-line paragraph(in this case, paragraph_b) because the contents of the to-be-deleted paragraph(in this case, paragraph_a) is random.
I've managed to make regular expression to select Only the paragraph that is located right above paragraph_b
https://regex101.com/r/pwGVbe/1 <- you can refer to it in here.
However, By using this regular expression I couldn't delete the thing I want.
here is what I've done so far:
import re
output = open ('output.txt', 'w')
input = open('input.txt', 'r')
for line in input:
# print(line)
t = re.sub('^(\w+ \w+\((?:(.|\n)*)\))\s*^paragraph_b','', line)
output.write(t)
Is there anything I can get some solution or clue? Any answer or advice would be appreciated.
Thanks.