1

I'm new to regex in python. I tried finding answers over web from my problem but none of those works. I'm trying to replace the 's with "is" only if it is preceded by a singular pronoun. so word's like "he's", "it's", etc. are to be replaced by "he is", "it is".

What I tried was:

line1 = "It's done. But there's some more you have to do. Gary's dog is in the precinct. Get it home. It's too far. There's rain"

re.sub("(?<=[it|that|here|there|he|she])'s",' is',line1,re.IGNORECASE)

Answer I got:

"It is done. But there is some more you have to do. Gary's dog is in the precinct. Get it home. It's too far. There's rain"

It is doing what I want in first two sentences but not in later sentences. Can anyone point out my mistake and solution to it?

2 Answers2

2

You have two problems. First, you are confounding a regex character class with an alternation. Your current lookbehind does not mean what you think:

(?<=[it|that|here|there|he|she])

This means that the previous character was one of the characters in the class, not one of the words. It is the same as this:

[aehirst|]

But even fixing this won't work, because re.sub does not support variable width lookbehinds. We can workaround this by capturing the previous term and then using it in the replacement:

re.sub("(it|that|here|there|he|she)'s", '\\1 is', line1, flags=re.IGNORECASE)

It is done. But there is some more you have to do. Gary's dog is in the precinct.
Get it home. It is too far. There is rain

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Doesn't run in python36 - `raise error("look-behind requires fixed-width pattern")` – jonathan Apr 02 '18 at 05:24
  • Thanks. Also explained here - [Python Regex Engine - “look-behind requires fixed-width pattern” Error](https://stackoverflow.com/questions/20089922/python-regex-engine-look-behind-requires-fixed-width-pattern-error) – jonathan Apr 02 '18 at 05:55
  • I ran with your suggestion, this is the result i got. I'm sure you ran before posting but why doesn't it work in my case “It’s done. But \x01 is some more you have to do. Gary’s dog is in the precinct. Get it home. It’s too far. T\x01 is rain” ​ – Shreshtha Kulkarni Apr 03 '18 at 13:18
  • @ShreshthaKulkarni I had some minor typos in my answer, no doubt due to not testing my code. Check the updated answer and demo where everything is working now. – Tim Biegeleisen Apr 03 '18 at 13:34
-1

Though i am not sure that it'll be very helpful but it does the trick:

Get rid of the re.IGNORECASE option.

>>> re.sub("(?<=[it|that|here|there|he|she])'s",' is',line1)
"It is done. But there is some more you have to do. Gary's dog is in the precinct. Get it home. It is too far. There is rain"
Ubdus Samad
  • 1,218
  • 1
  • 15
  • 27