1

I want a function using regex that will map certain punctuation characters (these: ., ,, (, ), ;, :) to a function of that character. Specifically, it will put a space on either side.

For example, it would map the string "Hello, this is a test string." to "Hello , this is a test string . " This is what I have right now:

import re

def add_spaces_to_punctuation(input_text)
    text = re.sub('[.]', ' . ', input_text)
    text = re.sub('[,]', ' , ', text)
    text = re.sub('[:]', ' : ', text)
    text = re.sub('[;]', ' ; ', text)
    text = re.sub('[(]', ' ( ', text)
    text = re.sub('[)]', ' ) ', text)
    return text

This works as intended, but is pretty unwieldy/hard to read. If I had a way to map each punctuation characters to a function of that character in a single regex line it would improve it significantly. Is there a way to do this with regex? Sorry if its something obvious, I'm pretty new to regex and don't know what this kind of thing would be called.

Christian Doucette
  • 1,104
  • 3
  • 12
  • 21

2 Answers2

2

You could try the following:

import re

recmp = re.compile(r'[.,:;()]')

def add_spaces_to_punctuation(input_text):
    text = recmp.sub(r' \1 ', input_text)

    return text

Also, taking performance into consideration and according to this answer, it should be faster if you need to run it often.

Rfroes87
  • 668
  • 1
  • 5
  • 15
1

Try this:

text = re.sub(r"(\.|\,|\(|\)|\;|\:)", lambda x: f' {x.group(1)} ', text)

It uses a capture group to capture whatever the character may be and then map that to the same character with a space on either side using a lambda expression (see https://www.w3schools.com/python/python_lambda.asp for more info).

The part of the regex in parenthesis (in this case the whole thing) is what gets captured and x.group(1) will give you the captured character.

costaparas
  • 5,047
  • 11
  • 16
  • 26