I have a string in the form of html code like
<head><p> this is the header</p></head> <body>..... </body>
I want to split this string such that I only get <head><p>
and the tags. Is there a way to do this in C# using regex?
I have a string in the form of html code like
<head><p> this is the header</p></head> <body>..... </body>
I want to split this string such that I only get <head><p>
and the tags. Is there a way to do this in C# using regex?
I assume when you say "only get <head><p>
and the tags" that you mean you want to identify ALL tag elements in the entire string, including the closing tags and the <body>
tags, etc...?
In any case, the answer is YES, you can do this in C#. There are many good XML/HTML parsers that you can look into, but if you are specifically trying to get an array or list of all of the tag elements and are determined to use regex, then you could use something like Regex.Split(input, pattern). https://msdn.microsoft.com/en-us/library/8yttk7sy(v=vs.110).aspx
Basically you'll want to setup your pattern and make sure to escape any of the XML characters:
string pattern = "\<.+\>"
(note this may not be the exact regex pattern you want)
Then just do something like this:
string[] tags = Regex.Split(htmlString, pattern);
--UPDATE--
Because of how strongly many people feel about using regex to parse XML/HTML, I thought I should update with an additional comment about an alternative.
If you truly want to get a list, or an array of the tag elements in the string, you could use something like the XElement class. simply create a new XElement object from the string you want, and then you can do all kinds of neat stuff, including iterating over the tags, and nested tags if needed, to create a list, or array. XElement may not be exactly what you want for an HTML string, but it gives you an idea of the possibilities without needing regex. Cheers!