regex: remove 's' from the end of every word except starting from capital letter but not at the beginning?

Question

Condition is: 's' is removed at the end of every word if it is not in the middle of the sentence.

The input string is:

Ses Holmes os. Sos

The output should be:

Se Holmes o. So

I started with this condition

([A-Z][a-z]+)

but got stuck on it. It cannot be inserted into negative lookbehind.

You don't really need regex here (although it *can* be achieved using it). — Maroun, Feb 04 '16 at 07:28
I think he wants to remove a terminal `s` from all words except uppercase words (names?) that occur after the first word in a sentence. Which leads to the questions "What if the first word in a sentence *is* a name?" and "How can you tell where a sentence begins?" (looking for punctuation is not going to work, Dr. Watson...) — Tim Pietzcker, Feb 04 '16 at 07:34
This is impossible without a very clear definition of a) a word and b) the start and end of a sentence. — timgeb, Feb 04 '16 at 08:00

score 0 · Answer 1 · edited May 23 '17 at 11:45

0

The regular expression already looks good, although it doesn’t catch words like café.

To do the replacemnt, you should call re.sub with a function, as explained in Python replace string pattern with output of function. In that function you can implement the exceptions to the rule, so that you express them as Python code, not as regular expression.

edited May 23 '17 at 11:45

Community

1
1

answered Feb 04 '16 at 07:37

Roland Illig

40,703
10
88
121

That won't work - the exceptions are based on the context of the match, and that's not present when you've passed the match to the function. – Tim Pietzcker Feb 04 '16 at 08:36

regex: remove 's' from the end of every word except starting from capital letter but not at the beginning?

1 Answers1