-5

I have the following String "</script><div id='PO_1WTXxKUTU98xDU1'><!--DO NOT REMOVE-CONTENTS PLACED HERE--></div>"

I require to get the attribute value from the div tag. How can i retrieve this using C#.

Abishek
  • 11,191
  • 19
  • 72
  • 111

4 Answers4

1

Avoid parsing html with regex

Regex is not a good choice for parsing HTML files..

HTML is not strict nor is it regular with its format..

Use htmlagilityPack

You can do it like this with htmlagilityPack.

HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);
List<string> itemList = doc.DocumentNode.SelectNodes("//div[@id]")//selects all div having id attribute
.Select(x=>x.Attributes["id"].Value)//select the id attribute value
.ToList<string>();
//itemList will now contain all div's id attribute value
Anirudha
  • 32,393
  • 7
  • 68
  • 89
0

Strictly solving the question asked, one of a myriad ways of solving it would be to isolate the div element, parse it as an XElement and then pull the attribute's value that way.

        string bobo = "</script><div id='PO_1WTXxKUTU98xDU1'><!--DO NOT REMOVE-CONTENTS PLACED HERE--></div>";
        string justDiv = bobo.Substring(bobo.IndexOf("<div"));
        XElement xelem = XElement.Parse(justDiv);
        var id = xelem.Attribute("id");
        var value = id.Value;

There are certainly lots of ways to solve this but this one answers the mail.

itsmatt
  • 31,265
  • 10
  • 100
  • 164
  • i dnt think using linq2xml is a good way to solve this problem..specially if its html – Anirudha Nov 01 '12 at 17:15
  • Well, you're certainly allowed your opinion. It *does* answer the question though and doesn't take a dependency on another library to do it. – itsmatt Nov 01 '12 at 17:25
0

If you're a masochist you can do this old school VB3 style:

        string input = @"</script><div id='PO_1WTXxKUTU98xDU1'><!--DO NOT REMOVE-CONTENTS PLACED HERE--></div>";
        string startString = "div id='";

        int startIndex = input.IndexOf(startString);

        if (startIndex != -1)
        {
            startIndex += startString.Length;
            int endIndex = input.IndexOf("'", startIndex);
            string subString = input.Substring(startIndex, endIndex - startIndex);
        }
C.M.
  • 1,474
  • 13
  • 16
-1

A .NET Regex that looks something like this will do the trick

^</script><div id='(?<attrValue>[^']+)'.*$

you can then get hold of the value as

MatchCollection matches = Regex.Matches(input, @"^</script><div id='(?<attrValue>[^']+)'.*$");
if (matches.Count > 0)
{
    var attrValue = matches[0].Groups["attrValue"];
}
Rich Amos
  • 259
  • 2
  • 4