I have a whole pile of HTML which is just a bunch of this:
<li id="entry-c7" data-user="ThisIsSomeonesUsername">
<img width="28" height="28" class="avatar" src="http://very_long_url.png">
<span class="time">6:07</span>
<span class="username">ThisIsSomeonesUsername</span>
<span class="message">This is my message. It is nice, no?</span>
</li>
Repeated over and over again about a hundred thousand times (with different content, of course). This is all taken from an HTMLDocument by retrieving the element which holds all this. The document is retrieved from a WebBrowser in a Windows Form. This looks like:
HtmlDocument document = webBrowser1.Document;
HtmlElement element = document.GetElementById(chatElementId);
Assume "chatElementId" is just some known ID. What I would like to do is retrieve the content in "time" (6:07 in this example), "username" (ThisIsSomeonesUsername), and "message" (This is my message... etc.). The message portion can contain almost anything, including further html (such as links, images, etc.), but I want to keep all that intact. I was going to use a regular expression to parse the InnerHtml of the element retrieved using the method above, but apparently this will bring about the destruction of the universe. How then should I go about doing this?
Edit: People keep suggesting Html Agility Pack, so is there an easy way to go about doing this in Html Agility Pack without using the full HTML source? I'm not sure if the rest of the html outside of this class is all that great... but should I just pass the whole html anyway?