0

I have written a program to identify tags(between < and >) in a string. From the below string I am able to get <P>, <OL> and <LI> . Div is not getting any idea what I am doing wrong?

 string yy = @"<P>&nbsp;</P><OL><LI><DIV align=center>fjsdhfsdjf</DIV></LI><LI>";

 MatchCollection allMatchResults = null;
 var regexObj = new Regex(@"<\w*>");
 allMatchResults = regexObj.Matches(yy);
Markus Safar
  • 6,324
  • 5
  • 28
  • 44
user1844634
  • 1,221
  • 2
  • 17
  • 35

5 Answers5

0

DIV is not begin matched because \w is not matching spaces. Use new Regex(@"<[^>]+>");

  • 1
    If you want the tag name only regex shall be something like @"<(\w+)\s[^>]*>" and you will need to get the first group in the match – Tihomir Totev Feb 11 '16 at 12:41
0

You are not getting Div because it has got attribute. Use .*? to include attributes or any text.

var regexObj = new Regex(@"<\w.*?>");

You can use Html Agility Pack to easily parse and manipulate the HTML.

Adil
  • 146,340
  • 25
  • 209
  • 204
0

\w* will match only alfanemeric characters. Here problem lies in space and =

Quick solution: <[^>]+> instead of <\w*>

But You may want to consider this: RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
D. Cichowski
  • 777
  • 2
  • 7
  • 24
0

Your regex is wrong, should be something like

@"<[^>]+>"

Also, if you have to do a lot of regexes like this, maybe it's better to use something like HTMLAgilityPack. It allows you to parse out the html into node lists that you can iterate through. Samples can be found here.

Milos Maksimovic
  • 299
  • 2
  • 5
  • 17
0

I believe more in this method we are using this one daily where I work. its a translation company so we translate xml, html, php files to different languages.

var myRegex= new Regex(@"(<[^>]+>)");

here is just the regex:

(<[^>]+>)
XsiSecOfficial
  • 954
  • 8
  • 20