52

I'm trying to parse the following HTML file, I'd like the get the value of key. This is being done on Silverlight for Windows phone.

<HTML>
<link ref="shortcut icon" href="favicon.ico">
<BODY>
<script Language="JavaScript">
location.href="login.html?key=UEFu1EIsgGTgAV7guTRhsgrTQU28TImSZkYhPMLj7BChpBkvlCO11aJU2Alj4jc5"
</script>
<CENTER><a href="login.html?key=UEFu1EIsgGTgAV7guTRhsgrTQU28TImSZkYhPMLj7BChpBkvlCO11aJU2Alj4jc5">Welcome</a></CENTER></BODY></HTML>

any idea's on where to go from here?

thanks

Teoman Soygul
  • 25,584
  • 6
  • 69
  • 80
Nathan
  • 2,461
  • 4
  • 37
  • 48
  • 1
    I just added a question to the [Software Recommendations](http://softwarerecs.stackexchange.com/) Stack Exchange site for this – [C# library for parsing HTML? - Software Recommendations Stack Exchange](http://softwarerecs.stackexchange.com/questions/10773/c-library-for-parsing-html/10774#10774). – Kenny Evitt Aug 15 '14 at 23:30
  • The question this duplicates has been closed... So this one should probably be reopened. – Andrew Jun 26 '19 at 19:47
  • @Andrew the other question wasn't on-topic either. By inference it would make sense to close this one. – sehe Jun 26 '19 at 21:51
  • @Andrew The dup question isn't much better than this one but it already has a long list of answers with a high number of votes. – Ted Lyngmo Jun 26 '19 at 21:52

2 Answers2

77

Give the HTMLAgilityPack a look into. Its a pretty decent HTML parser

http://html-agility-pack.net/?z=codeplex

Here's some code to get you started (requires error checking)

HtmlDocument document = new HtmlDocument(); 
string htmlString = "<html>blabla</html>";
document.LoadHtml(htmlString);
HtmlNodeCollection collection = document.DocumentNode.SelectNodes("//a");
foreach (HtmlNode link in collection)
{
     string target = link.Attributes["href"].Value;
}
John Smith
  • 7,243
  • 6
  • 49
  • 61
Kurru
  • 14,180
  • 18
  • 64
  • 84
-3

You can use regular expression (Regex class) for it. The expression can be something like that: login.html\?key=[^"]*

Rafal Spacjer
  • 4,838
  • 2
  • 26
  • 34
  • 5
    I won't downvote because I'm nice but RegEx isn't a sure fire way to do this anymore, rather HTMLAgilityPack is pretty much gold standard these days. – pixelbobby May 19 '11 at 18:32
  • 13
    -1 (unfortunately I'm being fair - nothing to do with *nice* - and this info will help you as well to not try to attempt to parse HTML using RexEx) http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Robert Koritnik May 19 '11 at 18:33
  • 3
    Regex may work but I highly suggest otherwise, for the future. – Pat May 19 '11 at 18:37
  • 5
    Though it's generally not right to _parse_ html with regex, for the given scenario (where you only need to extract a single little piece), they might be a simple, lightweight and straight solution. It depends on how fast and deep you expect the html to change. – Dercsár May 19 '11 at 18:42
  • 6
    Yes, I agree that regex isn't for parsing html, but for simple solution it can be ok. If all you need is to take one value from a file and for that you will add assembly to you program (the size of your app will be bigger) I'm not sure if it's wise. For me at least there is no one trut and everything depends from the context. – Rafal Spacjer May 19 '11 at 18:56
  • Please, don't use the RegEx for XML or similar documents ... – Artfaith Oct 17 '20 at 16:11