Parse full string in Html using C#

Question

I have the following two examples of html-

<a href="http://foo.com">User</a>: <a style="color:#333" href="http://foo.com/word"></a> blue elephant  &middot;

<a href="http://foo.com">User</a>: <a style="color:#333" href="http://foo.com/word">@<b>word</b></a> blue elephant  &middot;

I am trying to parse this using C# to put into a csv file and it is working to an extent however, when the html contains the '@' symbol in it, it will either leave the csv cell blank or not include the word with '@' before it. The main part I am trying to get is @word blue elephant however this is bringing back a blank cell, whereas the first html example brings back blue elephant as desired.

I am using the following technique to do this-

string[] comm = System.Text.RegularExpressions.Regex.Split(content[1], "<a");

How can I alter this to work for the second html example?

http://htmlagilitypack.codeplex.com/ – SLaks Oct 24 '11 at 21:52 — SLaks, Oct 24 '11 at 21:52

score 6 · Accepted Answer · edited Nov 25 '17 at 15:17

6

You want to use a proper HTML parser like the one in HTML agility pack in this situation (and save yourself from invoking the wrath of Cthulhu)

Some examples of how to use it

edited Nov 25 '17 at 15:17

carla

1,970
1
31
44

answered Oct 24 '11 at 21:53

Russ Cam

124,184
33
204
266

Ok thanks for the input, I presume my question would not be overly complex when using a tool like this? – Ebikeneser Oct 24 '11 at 22:05
No, it's pretty easy to use and understand, if your familiar with the structure of HTML documents. If you're not, you soon will be :) – Russ Cam Oct 24 '11 at 22:32
I have mark your answer as useful, however will give full credit once I get my head around the agility pack thank you. – Ebikeneser Oct 24 '11 at 22:34

Parse full string in Html using C#

1 Answers1