I am trying to find tag article
and all it's content in HTML string using Regex.
I can successfully match open tag with attrs: <article[^>]*>
I've got issues with matching contents. (.*?)
- this technique is not working for me.
Please help.
I am trying to find tag article
and all it's content in HTML string using Regex.
I can successfully match open tag with attrs: <article[^>]*>
I've got issues with matching contents. (.*?)
- this technique is not working for me.
Please help.
You cannot use regular expressions to parse HTML in general. However, in constrained scenarios (i.e. when the input follows a rigid structure), you might be able to get away with it. In your case, you can use the following regex, provided that:
<article>
tags are not self-closing<article>
elements do not contain other <article>
descendants<article
and </article>
do not appear as literals in your HTML.Code:
var matches = Regex.Matches(html, @"<article.*?</article>", RegexOptions.Singleline);