Parsing HTML with c#.net

Question

I'm trying to parse the following HTML file, I'd like the get the value of key. This is being done on Silverlight for Windows phone.

<HTML>
<link ref="shortcut icon" href="favicon.ico">
<BODY>
<script Language="JavaScript">
location.href="login.html?key=UEFu1EIsgGTgAV7guTRhsgrTQU28TImSZkYhPMLj7BChpBkvlCO11aJU2Alj4jc5"
</script>
<CENTER><a href="login.html?key=UEFu1EIsgGTgAV7guTRhsgrTQU28TImSZkYhPMLj7BChpBkvlCO11aJU2Alj4jc5">Welcome</a></CENTER></BODY></HTML>

any idea's on where to go from here?

thanks

I just added a question to the [Software Recommendations](http://softwarerecs.stackexchange.com/) Stack Exchange site for this – [C# library for parsing HTML? - Software Recommendations Stack Exchange](http://softwarerecs.stackexchange.com/questions/10773/c-library-for-parsing-html/10774#10774). — Kenny Evitt, Aug 15 '14 at 23:30
The question this duplicates has been closed... So this one should probably be reopened. — Andrew, Jun 26 '19 at 19:47
@Andrew the other question wasn't on-topic either. By inference it would make sense to close this one. — sehe, Jun 26 '19 at 21:51
@Andrew The dup question isn't much better than this one but it already has a long list of answers with a high number of votes. — Ted Lyngmo, Jun 26 '19 at 21:52

score 77 · Accepted Answer · edited May 21 '20 at 20:48

77

Give the HTMLAgilityPack a look into. Its a pretty decent HTML parser

http://html-agility-pack.net/?z=codeplex

Here's some code to get you started (requires error checking)

HtmlDocument document = new HtmlDocument(); 
string htmlString = "<html>blabla</html>";
document.LoadHtml(htmlString);
HtmlNodeCollection collection = document.DocumentNode.SelectNodes("//a");
foreach (HtmlNode link in collection)
{
     string target = link.Attributes["href"].Value;
}

edited May 21 '20 at 20:48

John Smith

7,243
6
49
61

answered May 19 '11 at 18:30

Kurru

14,180
18
64
84

1

+1 I've used this tool before and it's great. – pixelbobby May 19 '11 at 18:32
We do a lot of scraping using Agility pack and it rocks. Definitely try this. – Pat May 19 '11 at 18:37
1

i dont think you can use the agility pack for windows phone. – Nathan May 19 '11 at 19:03
1

Agility pack works with windows phone. Developing an app with it now, works great. – William Melani May 20 '11 at 05:47

score -3 · Answer 2 · answered May 19 '11 at 18:30

-3

You can use regular expression (Regex class) for it. The expression can be something like that: login.html\?key=[^"]*

answered May 19 '11 at 18:30

Rafal Spacjer

4,838
2
26
34

5

I won't downvote because I'm nice but RegEx isn't a sure fire way to do this anymore, rather HTMLAgilityPack is pretty much gold standard these days. – pixelbobby May 19 '11 at 18:32
13

-1 (unfortunately I'm being fair - nothing to do with *nice* - and this info will help you as well to not try to attempt to parse HTML using RexEx) http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Robert Koritnik May 19 '11 at 18:33
3

Regex may work but I highly suggest otherwise, for the future. – Pat May 19 '11 at 18:37
5

Though it's generally not right to _parse_ html with regex, for the given scenario (where you only need to extract a single little piece), they might be a simple, lightweight and straight solution. It depends on how fast and deep you expect the html to change. – Dercsár May 19 '11 at 18:42
6

Yes, I agree that regex isn't for parsing html, but for simple solution it can be ok. If all you need is to take one value from a file and for that you will add assembly to you program (the size of your app will be bigger) I'm not sure if it's wise. For me at least there is no one trut and everything depends from the context. – Rafal Spacjer May 19 '11 at 18:56
Please, don't use the RegEx for XML or similar documents ... – Artfaith Oct 17 '20 at 16:11

Parsing HTML with c#.net

2 Answers2

Linked