Spacing and pattern replacement

Question

It is two part question:

Part 1

To remove multiple white spaces, paragraph breaks to just one.

current code:

import re
# Read inputfile
with open('input.txt', 'r') as file :
  inputfile = file.read()

# Replace extras spaces with single space.
#outputfile = re.sub('\s+', ' ', inputfile).strip()
outputfile = ' '.join(inputfile.split(None))

# Write outputfile
with open('output.txt', 'w') as file:
  file.write(outputfile)

Part 2:

Once the extra spaces are removed; I search and replace pattern mistakes.

Like: ' [ ' to ' ['

Pattern1 = re.sub(' [ ', ' [', inputfile)

which throws an error:

raise error, v # invalid expression error: unexpected end of regular expression

Although. This works...(for example: to join words together before and after hyphen)

Pattern1 = re.sub(' - ', '-', inputfile)

I got many situations to handle with respect to punctuation problem after spacing issue is solved.

I don't want patterns to look into the output of previous pattern results and move further.

Is there a better approach to cut spaces around punctuation to just right.

Why are you using regex find and replace when what you want to do is a simple string replace? the character `[` has a meaning in regex, same goes for `-`. — Nir Alfasi, Oct 30 '17 at 00:35
Yes you are right. Could have used str replace. But, speed wise, which is more faster? — Programmer_nltk, Oct 30 '17 at 01:14
Usually regex is magnitude slower (in most programming languages). See: https://stackoverflow.com/questions/5668947/use-pythons-string-replace-vs-re-sub — Nir Alfasi, Oct 30 '17 at 01:16

score 1 · Accepted Answer · answered Oct 30 '17 at 00:39

For the first part, you can split it by newline blocks, compress each line, and then join it back on newlines, like so:

import re
text = "\n".join(re.sub(r"\s+", " ", line) for line in re.split("\n+", text))
print(text)

For the second part, you need to escape [ since it's a regex metacharacter (used to define character classes), like so:

import re
text = re.sub("\[ ", "[", text)
text = re.sub(" ]", "]", text)
print(text)

Note that you don't need to escape the ] because it doesn't match a [ so it isn't special in this context.

Try It Online!

Alternatively for the second part, text = text.replace("[ ", "[").replace(" ]", "]") because you don't even need regular expressions.

Spacing and pattern replacement

1 Answers1