I'm a beginner programmer and I'm stuck on this possibly easy problem: I want to automatically add numbers to the sentences contained in the P tags of an .xml file. So a sample paragraph in the the .xml file looks like:
<P>Sentence1. Sentence2. Sentence3.</P>
I want to transform this into:
<P><SUP>1</SUP>Sentence1.<SUP>2</SUP> Sentence2.<SUP>3</SUP> Sentence3.</P>
However only the P tags containing at least 2 sentences should be numbered, if it contains only 1 sentence I want to leave it unchanged.
Here is the approach I have come up with so far, using regular expressions:
\.\s.*
# Reliably finds the second sentence, Insert <SUP>2</SUP> after it.
<P>[^>]*<SUP>2
# Finds the beginning of the first sentence if a second sentence exists.
However I feel like this is a really awkward approach that I wouldn't really know how to extend for Paragraphs containing 20 sentences or more, or .xml documents containing many paragraphs. Is there a better regular expression to achieve this or a better (Python) tool than regular expressions?