-1

With the regex pattern (Python):

(?<=start<).*?(?=>end)

I'd like to match/select only the most internal text of the following string:

start< obj1 obj2 start< obj3 >end

that is:

obj3

I use Pythex as online regex tester for my code (link). Pythex returns

obj1 obj2 start< obj3

instead of

obj3

Do you know a way to force the match of the most internal text? Maybe with some extra python code (if it is impossible with regex alone)?

Thanks

UPDATE 01 Sorry, I've tested your solutions (all) with different kinds of strings but I can't obtain what I want that is: match all between "start<" and ">end" but excluding strings containing "start<" and other characters before "start<".

For example if I have the string

start< obj1 >end start< obj2 >end start< obj3 start< obj4 >end

where "obj4" is equal to "<" (for example), no method/pattern proposed works because no method can match "<" at the end of the string. For the string above I'd like to obtain the following matched text (findall):

  1. obj1
  2. obj2
  3. obj3
  4. <

regardless of what "obj4" is (so the method I'm searching should work in general also if obj# is equal to "<").

Can you suggest some other solutions?

Thank you

Matteo VR
  • 11
  • 5
  • could you simply add a space character before to make it `(?<= start<).*?(?=end)`? – SRT HellKitty Oct 16 '19 at 15:07
  • Hi, thank you for reply, good answer for the original string, but doesn't work if we have `start< obj1 obj2 start< start< obj3 >end` (three `start<` instead of two). – Matteo VR Oct 17 '19 at 07:39

3 Answers3

0

Use the following approach with improved regex pattern:

import re

s = 'start< obj1 obj2 start< obj3 >end'
m = re.search(r'(?<=start<)[^<]*?(?=>?end)', s)
res = m.group().strip() if m else m
print(res)    # obj3
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
0

What about this?

r'start.*<(.*)>.*end'

In this case, the first * is so greedy that matches everything up to the last <. This gives you the inner text (accessible by group(1)).

Patrizio G
  • 362
  • 3
  • 13
  • Hi, good also this solution for the original string. I played a little and I've see that if we have the string `start< obj1 obj2 start< obj3 >end start< obj4 >end`, this regex matches only `obj4` and not also `obj3`, which is at the same depth level as `obj4`. – Matteo VR Oct 17 '19 at 08:00
0

This can be done without lookahed or lookbehind, too:

s= "start< obj1 obj2 start< obj3 >end"
m=re.search(r"start<\s*([^<]*?)\s*>end",s)
>>> m[1]
>>> 'obj3'
kantal
  • 2,331
  • 2
  • 8
  • 15