0

I have a string like this

<tag1>
    <tag1>
        any text
    </tag1>
    text
</tag1>

and I want to find a <tag1>, that contains shortest text in this string.

I used the following regex <tag1>.*?</tag1>, but instead of <tag1>any text</tag1> i got <tag1> <tag1>any text</tag1>. Here is the example.

Why it doesn't works and what am I doing wrong?

default locale
  • 13,035
  • 13
  • 56
  • 62

3 Answers3

1

You can use this simple code to solve your specific problem :

<tag1>[^<]*</tag1>
Sujith PS
  • 4,776
  • 3
  • 34
  • 61
0

I would be able to help you if those tags were not nested inside themselves (the same tag).

It is generally a bad idea to do this type of thing with regex. You should get a proper parser to fit your requirements.

Vasili Syrakis
  • 9,321
  • 1
  • 39
  • 56
0

It is not working, because it will start matching at the first <tag1> and then match as least as possible, so ending at the first </tag1>, resulting in "<tag1> <tag1>any text</tag1>".

You can avoid matching tags by using a negated character class

<tag1>[^<>]*</tag1>

See it on Regexr.

The other possibility is to use a negated lookahead assertion and match the next character only, if it is not the tag.

(<tag1>)((?!\1).)*?</tag1>

See it on Regexr

stema
  • 90,351
  • 20
  • 107
  • 135