Get words between "<" and ">" in .net

Question

I have written a program to identify tags(between < and >) in a string. From the below string I am able to get <P>, <OL> and <LI> . Div is not getting any idea what I am doing wrong?

 string yy = @"<P>&nbsp;</P><OL><LI><DIV align=center>fjsdhfsdjf</DIV></LI><LI>";

 MatchCollection allMatchResults = null;
 var regexObj = new Regex(@"<\w*>");
 allMatchResults = regexObj.Matches(yy);

score 0 · Answer 1 · answered Feb 11 '16 at 12:39

0

DIV is not begin matched because \w is not matching spaces. Use new Regex(@"<[^>]+>");

answered Feb 11 '16 at 12:39

Tihomir Totev

1
2

1

If you want the tag name only regex shall be something like @"<(\w+)\s[^>]*>" and you will need to get the first group in the match – Tihomir Totev Feb 11 '16 at 12:41

score 0 · Answer 2 · answered Feb 11 '16 at 12:39

0

You are not getting Div because it has got attribute. Use .*? to include attributes or any text.

var regexObj = new Regex(@"<\w.*?>");

You can use Html Agility Pack to easily parse and manipulate the HTML.

answered Feb 11 '16 at 12:39

Adil

146,340
25
209
204

score 0 · Answer 3 · edited May 23 '17 at 10:28

0

\w* will match only alfanemeric characters. Here problem lies in space and =

Quick solution: <[^>]+> instead of <\w*>

But You may want to consider this: RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 10:28

Community

1
1

answered Feb 11 '16 at 12:41

D. Cichowski

777
2
7
24

score 0 · Answer 4 · answered Feb 11 '16 at 12:41

Your regex is wrong, should be something like

@"<[^>]+>"

Also, if you have to do a lot of regexes like this, maybe it's better to use something like HTMLAgilityPack. It allows you to parse out the html into node lists that you can iterate through. Samples can be found here.

score 0 · Answer 5 · answered Feb 11 '16 at 12:56

0

I believe more in this method we are using this one daily where I work. its a translation company so we translate xml, html, php files to different languages.

var myRegex= new Regex(@"(<[^>]+>)");

here is just the regex:

(<[^>]+>)

answered Feb 11 '16 at 12:56

XsiSecOfficial

954
8
20

Get words between "<" and ">" in .net

5 Answers5