RxExp to matching the first tag

Question

I'm trying to match the text content from the first tag <test>.

For example:

<test>SAMPLE TEXT</test><test>SAMPLE TEXT2</test><test>SAMPLE TEXT3</test>

If I use

("<test>(.*)</test>")`

I got this:

SAMPLE TEXT</test><test>SAMPLE TEXT2</test><test>SAMPLE TEXT3

How to get just the content from the first <test> tag: SAMPLE TEXT?

That looks like XML. Luckily .NET has some really excellent, easy-to-use XML parsing libraries. Why not use them? — Mark Byers, Apr 18 '12 at 13:15
Yes, I know.. I have already being using them. But in this case I really need the Regular Expression. This is my example just to show what I need, but in practice it's not the valid xml. — Mega, Apr 18 '12 at 13:31

score 4 · Accepted Answer · edited May 23 '17 at 11:49

(.*) is greedy (meaning "everything you can match until you find the last </test>"), you're looking for the non-greedy version (.*?) (meaning "as little as you can match until you find the very first </test>").

Do however keep in mind the call of Cthulu when thinking about parsing HTML with regex and take a look at this question for a discussion about the best practices for parsing HTML with .NET. Or, if this is XML (not HTML), then by all means, do it the proper (and easy) way with an XmlReader.

score 1 · Answer 2 · answered Apr 18 '12 at 13:15

1

Instead of .* use .*?

The question mark makes the asterisk lazy, causing it to match as little as possible. Without it, the asterisk is greedy and matches as much as it can.

answered Apr 18 '12 at 13:15

Indrek

867
8
27

score 1 · Answer 3 · answered Apr 18 '12 at 13:16

1

Answer of @Radu is very good, but also try review apply following:

"<test>([^<]*)</test>"

answered Apr 18 '12 at 13:16

Dewfy

23,277
13
73
121

Well, that won't match ``. Then again, XML parsing is full of pitfalls. – rid Apr 18 '12 at 13:20
@Radu fully agree. That is why you answer is better. But this case may be very fast when Ljupco_Sofijanov really sure that only TEXT is possible inside. – Dewfy Apr 18 '12 at 13:22

score 1 · Answer 4 · answered Apr 18 '12 at 13:18

1

I agree that you could use XML parsing libraries, but I'll reply anyway :

("<test>([^<]*)</test>")

would parse all characters different from '<', which is the first character you want to ignore.

HTH.

answered Apr 18 '12 at 13:18

Skippy Fastol

1,745
2
17
32

RxExp to matching the first tag

4 Answers4