Get string between two sub strings with limitation

Question

I need help finding a substring using regex, starting with an example:

Given the following string:

test_str = "start: 1111 kill 22:22 start: 3333 end"

I would like to extract the string between start and end that doesn't involve kill:

wanted_result = (start: 3333 end)

Note: I need to get all matches of start blablab end that don't have kill between them

Several tries failed, the latest one:

pattern = re.compile(r'start:(.+?)(([^kill])end)',flags = re.DOTALL)
results = pattern.findall(test_str)

which results in a different result:

result = (' 1111 kill 22:22 start: 3333', ' end', ' end')

Avinash Raj · Accepted Answer · 2015-08-03T08:15:19.147

You need to use a negative lookahead based regex.

pattern = re.compile(r'start:(?:(?!kill).)*?end',flags = re.DOTALL)

(?:(?!kill).)*? would do checking before match a character. It checks that the character going to be matched would be any but it must not be a start of the substring kill.

Example:

>>> import re
>>> test_str = "start: 1111 kill 22:22 start: 3333 end"
>>> pattern = re.compile(r'start:(?:(?!kill).)*?end',flags = re.DOTALL)
>>> pattern.findall(test_str)
['start: 3333 end']

score 1 · Answer 2 · edited May 23 '17 at 12:14

1

As a hint you may note that negated character class will exclude the characters within the character class not the words.For that aim you need to use a negative look-ahead.

So instead of [^kill] you need (?!kill).

And read this question about regular-expression-to-match-line-that-doesnt-contain-a-word

edited May 23 '17 at 12:14

Community

1
1

answered Aug 03 '15 at 07:58

Mazdak

105,000
18
159
188

Thank you for the explanation! – Despair Aug 03 '15 at 08:07

Get string between two sub strings with limitation

2 Answers2