Possible Duplicate:
Numbering the sentences inside a <P> in a .xml file?
I'm just starting out in programming, so this problem is very trivial, except for me. I have a .xml file containing content like:
<p> sentence1. sentence2. sentence3.</p>
<p> sentence1. </p>
Now I have written a script with BeautifulSoup to append each single paragraph ending with a STRING, so it looks like:
<p> sentence1. sentence2. sentence3. STRING</p>
<p> sentence1. STRING </p>
In the < p > that contain only 1 sentence that is all I want to do. But if a < p > contains more than I sentence, I want to add the STRING to each sentence ending + the sentence number. For example the upper paragraph would be:
<p> sentence1. STRING1 sentence2. STRING2 sentence3. STRING3 </p>
Here is my working script for 1 sentence with the .append method, but I couldn't get it to work for multiple sentences. Any help would be appreciated!
soup = BeautifulSoup(xmlfile)
p = norm.findAll("p")
for i in p:
dotsplit = re.compile(r'\. \w')
sentences = dotsplit.split(i.text)
if len(sentences) == 1:
appendix = "STRING"
i.append(appendix)
print i
if len(sentences) > 1:
for x in sentences:
sentencenumber = ???????
# Should equal (index of sentences)+1, meaning sentences[0] = 1
appendix = sentencenumber + "STRING"
i.append(appendix)
print i