I have the following String "</script><div id='PO_1WTXxKUTU98xDU1'><!--DO NOT REMOVE-CONTENTS PLACED HERE--></div>"
I require to get the attribute value from the div tag. How can i retrieve this using C#.
I have the following String "</script><div id='PO_1WTXxKUTU98xDU1'><!--DO NOT REMOVE-CONTENTS PLACED HERE--></div>"
I require to get the attribute value from the div tag. How can i retrieve this using C#.
Avoid parsing html with regex
Regex
is not a good choice for parsing HTML
files..
HTML is not strict nor is it regular with its format..
Use htmlagilityPack
You can do it like this with htmlagilityPack.
HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);
List<string> itemList = doc.DocumentNode.SelectNodes("//div[@id]")//selects all div having id attribute
.Select(x=>x.Attributes["id"].Value)//select the id attribute value
.ToList<string>();
//itemList will now contain all div's id attribute value
Strictly solving the question asked, one of a myriad ways of solving it would be to isolate the div
element, parse it as an XElement
and then pull the attribute's value that way.
string bobo = "</script><div id='PO_1WTXxKUTU98xDU1'><!--DO NOT REMOVE-CONTENTS PLACED HERE--></div>";
string justDiv = bobo.Substring(bobo.IndexOf("<div"));
XElement xelem = XElement.Parse(justDiv);
var id = xelem.Attribute("id");
var value = id.Value;
There are certainly lots of ways to solve this but this one answers the mail.
If you're a masochist you can do this old school VB3 style:
string input = @"</script><div id='PO_1WTXxKUTU98xDU1'><!--DO NOT REMOVE-CONTENTS PLACED HERE--></div>";
string startString = "div id='";
int startIndex = input.IndexOf(startString);
if (startIndex != -1)
{
startIndex += startString.Length;
int endIndex = input.IndexOf("'", startIndex);
string subString = input.Substring(startIndex, endIndex - startIndex);
}
A .NET Regex that looks something like this will do the trick
^</script><div id='(?<attrValue>[^']+)'.*$
you can then get hold of the value as
MatchCollection matches = Regex.Matches(input, @"^</script><div id='(?<attrValue>[^']+)'.*$");
if (matches.Count > 0)
{
var attrValue = matches[0].Groups["attrValue"];
}