It is two part question:
Part 1
To remove multiple white spaces, paragraph breaks to just one.
current code:
import re
# Read inputfile
with open('input.txt', 'r') as file :
inputfile = file.read()
# Replace extras spaces with single space.
#outputfile = re.sub('\s+', ' ', inputfile).strip()
outputfile = ' '.join(inputfile.split(None))
# Write outputfile
with open('output.txt', 'w') as file:
file.write(outputfile)
Part 2:
Once the extra spaces are removed; I search and replace pattern mistakes.
Like: ' [ ' to ' ['
Pattern1 = re.sub(' [ ', ' [', inputfile)
which throws an error:
raise error, v # invalid expression error: unexpected end of regular expression
Although. This works...(for example: to join words together before and after hyphen)
Pattern1 = re.sub(' - ', '-', inputfile)
I got many situations to handle with respect to punctuation problem after spacing issue is solved.
I don't want patterns to look into the output of previous pattern results and move further.
Is there a better approach to cut spaces around punctuation to just right.