I want to take everything in an HTML document and capitalize the sentences (within paragraph tags). The input file has everything in all caps.
My attempt has two flaws - first, it removes the paragraph tags, themselves, and second, it simply lower-cases everything in the match groups. I don't quite know how capitalize() works, but I assumed that it would leave the first letter of sentences... capitalized.
There may be a much easier way to do this than regex, too. Here's what I have:
import re
def replace(match):
return match.group(1).capitalize()
with open('explanation.html', 'rbU') as inf:
with open('out.html', 'wb') as outf:
cont = inf.read()
par = re.compile(r'(?s)\<p(.*?)\<\/p')
s = re.sub(par, replace, cont)
outf.write(s)