-1

I need help finding a substring using regex, starting with an example:

Given the following string:

test_str = "start: 1111 kill 22:22 start: 3333 end"

I would like to extract the string between start and end that doesn't involve kill:

wanted_result = (start: 3333 end)

Note: I need to get all matches of start blablab end that don't have kill between them

Several tries failed, the latest one:

pattern = re.compile(r'start:(.+?)(([^kill])end)',flags = re.DOTALL)
results = pattern.findall(test_str)

which results in a different result:

result = (' 1111 kill 22:22 start: 3333', ' end', ' end')
TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
Despair
  • 715
  • 1
  • 6
  • 14

2 Answers2

3

You need to use a negative lookahead based regex.

pattern = re.compile(r'start:(?:(?!kill).)*?end',flags = re.DOTALL)

(?:(?!kill).)*? would do checking before match a character. It checks that the character going to be matched would be any but it must not be a start of the substring kill.

Example:

>>> import re
>>> test_str = "start: 1111 kill 22:22 start: 3333 end"
>>> pattern = re.compile(r'start:(?:(?!kill).)*?end',flags = re.DOTALL)
>>> pattern.findall(test_str)
['start: 3333 end']
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

As a hint you may note that negated character class will exclude the characters within the character class not the words.For that aim you need to use a negative look-ahead.

So instead of [^kill] you need (?!kill).

And read this question about regular-expression-to-match-line-that-doesnt-contain-a-word

Community
  • 1
  • 1
Mazdak
  • 105,000
  • 18
  • 159
  • 188