3

I'm quite new to regular expression, and have been searching around for away to do this without success. Given a string, I want to remove any pattern that starts with "abc", ends with "abc", and does not contain "abc" in the middle. If I do

'abc.*(abc).*abc'

it will match any patter starts with "abc", ends with "abc", and does contain "abc" in the middle. How do I do the opposite. I try

'abc.*^(abc).*abc'

but it won't work.

martineau
  • 119,623
  • 25
  • 170
  • 301
user108372
  • 171
  • 1
  • 3
  • 9

2 Answers2

8

Your syntax for trying to negate part of your pattern is incorrect.

Also, ^ outside of a character class asserts position at the beginning of the string. You need to use a Negative Lookahead assertion and be sure to anchor the entire pattern.

^abc(?:(?!abc).)*abc$

Live Demo

hwnd
  • 69,796
  • 4
  • 95
  • 132
  • Thank you very much, and sorry I can't upvote the answer. May I ask you a follow up, so this works for re.match('^abc(?:(?!abc).)*abc$','abc222abc') But if I want to do this multiline, it seems not to work re.match('^abc(?:(?!abc).)*abc$','111/nabc222abc/nxyz/n', flags = re.MULTILINE). Do you have any suggestion? – user108372 Feb 22 '15 at 00:28
  • Hey sorry I'm new to the site, and I thought I can accept both answer. – user108372 Feb 22 '15 at 00:44
  • Glad I could help. And use `re.search()` instead. – hwnd Feb 22 '15 at 00:46
8

You can try the following pattern :

^abc((?!abc).)*abc$

(?!abc) is Negative Lookahead - Assert that it is impossible to match the abc inside your string.

Regular expression visualization

Debuggex Demo

Mazdak
  • 105,000
  • 18
  • 159
  • 188