How to loop and extract only the data between to markers?

Question

So I have a large volume of HTML text and I want to extract all of the data that is between every occurrence of  and  I have code that can locate the first occurrence of it and extract the first occurrence but can't seem to loop it.

I have tried for looping for the amount of times  will come up in the entire text.

I have tried looping it and deleting one occurrence and the text between( and ) but that did not seem to work either

var startTag = $"<p>";
var endTag = $"</p>";
int count = 0;
string ImpureCText = "<p>hello this is the first part</p>fgbtfhsgs <p> this is the second part</p> <p> this is the third part</p>";

int index1 = ImpureCText.IndexOf(startTag);
int index2 = ImpureCText.IndexOf(endTag);
foreach (Match match in Regex.Matches(ImpureCText, startTag))
{
    count++;
}
Console.WriteLine("'{0}'" + " Found " + "{1}" + " Times", startTag, count);

for (int i = 0; i < count; i++)
{
    //Do code stuff
    string delete = ImpureCText.Remove(ImpureCText.IndexOf("<p>"), ImpureCText.IndexOf("</p>"));
    Console.WriteLine(delete);
}

Console.ReadKey();

If you want to parse HTML, google for `HTML Agility Pack`. Don't use regexes. — mjwills, Oct 19 '19 at 11:50

tymtam · Answer 1 · 2019-10-19T12:15:21.060

Try a regular expression like (.*?)

Having said that, parsing html with regex could be considered bad style.

Example

string ImpureCText = "<p>hello this is the first part</p>fgbtfhsgs <p> this is the second part</p> <p> this is the third part</p>";

var matches = Regex.Matches(ImpureCText, "<p>(.*?)</p>");

foreach (var m in matches)
{
   Console.WriteLine(m.ToString());
}

prints

<p>hello this is the first part</p>
<p> this is the second part</p>
<p> this is the third part</p>

Edit

The 'bad style' refers to RegEx match open tags except XHTML self-contained tags (thanks @mjwills for finding it). Despite the funny accepted answer there, regex and html can successfully work together, especially when the parsed html is restricted.

https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — mjwills, Oct 19 '19 at 12:01

How to loop and extract only the data between to markers?

1 Answers1

Example

Edit