-1

using regex let say I have html as string How can I get all widgets control tag from string using regex.?

Current Approach

const string widgetStartPattern = "<widget:ContentPageView";
const string widgetEndPattern = "/>";

var allOccuranceOfWidgets = CountStringOccurrences(aspx, widgetStartPattern);

while (allOccuranceOfWidgets.Count > 0)
{
    var firstIndex = allOccuranceOfWidgets[0];
    var lastIndex = aspx.IndexOf(widgetEndPattern, firstIndex + 1, System.StringComparison.OrdinalIgnoreCase);

    var widgetUserControlTag = aspx.Substring(firstIndex, lastIndex - firstIndex + 2);
    var pageId = ExtractPageIdFromWidgetTag(widgetUserControlTag);
    var pageContent = GetContentFromaDatabase(pageId);

    aspx = aspx.Replace(widgetUserControlTag, pageContent);
    allOccuranceOfWidgets = CountStringOccurrences(aspx, widgetStartPattern);
}

Result list of all widgets control

<widget:ContentPageView id="ContentPageView0" PageId="165" runat="server" />
<widget:ContentPageView id="ContentPageView1" PageId="166" runat="server" />
<widget:ContentPageView id="ContentPageView2" PageId="167" runat="server" />

HTML

<div class="slogan">

<widget:ContentPageView id="ContentPageView0" PageId="165" runat="server" />

      </div>
      <div class="headertopright">
         <div class="headersocial">

<widget:ContentPageView id="ContentPageView1" PageId="166" runat="server" />
        </div>
        <div class="searchbox">
<widget:ContentPageView id="ContentPageView2" PageId="167" runat="server" />
John Saunders
  • 160,644
  • 26
  • 247
  • 397
SOF User
  • 7,590
  • 22
  • 75
  • 121

3 Answers3

2

You will probably be better off using the HTMLAgilityPack or possibly converting to XML and using xPath to do this. Using regex to parse HTML has been covered at length on StackOverflow and the consensus is that it is a bad idea.

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Abe Miessler
  • 82,532
  • 99
  • 305
  • 486
  • I update Question with current approach so u will get what i m doing.. consider html as string just here not html. – SOF User May 21 '13 at 23:15
  • The structure of the 'string' is very important when you are using regexes. What is someone uses `` instead? – Andrey Shchekin May 21 '13 at 23:17
  • so my current code is fine as shown in question no need to refactor with regex? – SOF User May 21 '13 at 23:19
  • no its fixed generated by system so it would be axactly what it is in string no other posiblities even same case sensative will no change only page id is changing inside – SOF User May 21 '13 at 23:20
  • 2
    Pretending that you aren't trying to parse HTML using regex will not help you. Quote from the link I provided, `Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML.` There are just better ways to do what you want. – Abe Miessler May 21 '13 at 23:30
  • @Abe: Agreed using an Htlm library or xml library would be a much better approach. RegEx is not really meant for such a task. – galford13x May 21 '13 at 23:57
2

As Abe Miessler said, you should not be parsing HTML with Regexes.
However! If you only want that exact string you specified and you are absolutely sure it can not be generated in any other way, your regex is:

<widget:ContentPageView id="(?:[^"]+)" PageId="(?:[^"]+)" runat="server" />

Note that this will find all occurrences, even if those are commented out.

Andrey Shchekin
  • 21,101
  • 19
  • 94
  • 162
1
List<string> widgets = new List<string>();

MatchCollection matches = Regex.Matches(yourHTMLCode, "<widget:([^/][^>])*/>");
foreach (Match match in matches)
{
    foreach (Capture capture in match.Captures)
    {
        widgets.Add(capture.Value);
    }
}

Source: http://www.dotnetperls.com/regex-matches

Isaac
  • 11,409
  • 5
  • 33
  • 45