I'm working on an academic research project that requires extracting titles from a Table of Contents. I'm making a Python program to clean up text that looks like this:
BONDS OF LATE:
An act providing the officers of the State of Illinois from making payments on certain bonds ............ 79
An act to provide for publishing a now edition of Dresses Reports ..................................... 78BRIDGES:
An act to provide for the better protection of the public bridges in this State ........................... 74
to look like this:
An act providing the officers of the State of Illinois from making payments on certain bonds .
An act to provide for publishing a now edition of Dresses Reports .
An act to provide for the better protection of the public bridges in this State .
My strategy is to somehow iterate through a text file and delete characters after the first '.' and before the next 'An act'. I thought about trying a nested 'for' loop like this:
for line in file:
for character in line:
But iterating by character makes it impossible to stop at a string (i.e. 'An act'). I'm a beginner to Python (and coding) and would greatly appreciate any help. Are there regular expressions that would help delete all the characters in a line before 'An act' and after the first period? Thank you!