1

I have the following text

<msgText>H3ll0World</msgText><msgText>HelloWorld</msgText><otherTagWithNumbers>456</otherTagWithNumbers>

I need to retrive only the numbers ( [\d] ) inside the tag <msgText> in order to replace them. In this case the regex would return 3 and 0.

What I have so far:

(<msgText>)([\s\S]*?)(<\/msgText>)

Here I have an example of this situation.

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
Joabe Lucena
  • 792
  • 8
  • 21
  • 3
    Try this [`\d+(?=(?:(?!<\/?\w+>).)*<\/msgText>)`](https://regex101.com/r/0zGJMj/4) – revo Mar 09 '18 at 20:58
  • Just open the xml as xml and go through its contents. – Nyerguds Mar 09 '18 at 20:58
  • 1
    you should consult [how to ask a question](https://stackoverflow.com/help/how-to-ask). where's the code where you tried something? –  Mar 09 '18 at 20:59

2 Answers2

1

You are trying to parse XML with regex, again...

But maybe this works on some extremely restricted subset of XML that you are dealing with:

(\d+)(?=\w*<\/msgText>)

Demo.

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
  • Thanks bro. I'm not trying to parse a XML, I just want to replace numbers inside a specific tag. Thanks again! – Joabe Lucena Mar 09 '18 at 21:17
  • @JoabeLucena Yeah, I understand that, it was just the obligatory disclaimer that XML is not regular, like for any other "regex + (X/HT)ML" question ;) Glad that you could solve your problem despite all the theoretical obstacles xD – Andrey Tyukin Mar 10 '18 at 02:01
1

You'd like to use a positive lookahead to look but not consume:

\d+(?=(?:(?!<\/?\w+>).)*<\/msgText>)

Breakdown:

  • \d+ Match any number
  • (?= That is followed by

    • (?:(?!<\/?\w+>).)* A tempered greedy token to match anything but not pass over tags
    • <\/msgText> Up to a closing tag
  • ) End of positive lookahead

Above is a strict match but there is a short way to achieve the same under some circumstances:

\d+(?=[^<]*<\/msgText>)

Live demo

revo
  • 47,783
  • 14
  • 74
  • 117