I'd like to use regex to remove the apostrophes in common contractions. For example, I'd like to map
test1 test2 can't test3 test4 won't
to
test1 test2 cant test3 test4 wont
My current naive approach is just to manually sub all the contractions I want to use.
def remove_contraction_apostrophes(input):
text = re.sub('can.t', 'cant', input)
text = re.sub('isn.t', 'isnt', text)
text = re.sub('won.t', 'wont', text)
text = re.sub('aren.t', 'arent', text)
return text
(I'm using can.t
because in the text I am parsing, it can use multiple characters for the apostrophe, like can't
and can`t
).
This is pretty unwieldy as I want to add all the common contractions. Is there a better way of doing this with regex, where I could construct a regex of this type by inputting a list of contractions? Or am I better off just listing them all like this?
It also may be possible to just work with the endings, like 'll
, n't
etc, but I'm a afraid of catching other things besides contractions with this.