Extracting HTML contents inside a div using regex

Question

I'm using following code for extracting the contents from a div with this format: <div id="post-contents"></div>

string findtext2 = @"<div[^>]*\\id=\post-contents\[^>]*>(.*?)</div>";
string myregex2 = txt;
MatchCollection doregex2 = Regex.Matches(myregex2, findtext2);
string matches2 = "";
foreach (Match match2 in doregex2)
{
    matches2 = (matches2 + (match2.ToString()));
}
return matches2;

But I got some errors with HTML tags. Actually the tag contains some other HTML tags as follow:

<div id="post-contents"><p dir="ltr">HI HI HI</p></div>

May you please help me how can I get just <p dir="ltr">HI HI HI</p>?

Thank you

Use HtmlAgilityPack. See [here](https://stackoverflow.com/a/1732454/3181933). — ProgrammingLlama, May 16 '18 at 07:58
Here, check this: [How to use HTML Agility pack](https://stackoverflow.com/q/846994/4934172). — 41686d6564 stands w. Palestine, May 16 '18 at 08:05
See [here](https://stackoverflow.com/questions/15448772/htmlagilitypack-get-innertext-of-a-td-tag-with-an-id-attribute) for close to what you want to do. It's with a td, not a div, but the concept is exactly the same. — ProgrammingLlama, May 16 '18 at 08:09

score 0 · Answer 1 · answered May 16 '18 at 08:21

Your regex works well in the described case: https://regex101.com/r/jbDN1U/1. But your can't handle cases like this with regexp:

<div id="post-contents"><div dir="ltr">HI HI HI</div></div>

Regexp can't determine which closing div to chose in this case. As was mentioned in comments consider using XML parser.

Extracting HTML contents inside a div using regex

1 Answers1