1

Using vb.net using regex how would i recover href and the cost?

I have tried various options, and have just learned that regex can be different depending on language, which means i have wasted 2 days trying to figure it out

<div class="single-album" id="m-1_1184">
<span class="album-time link-text">
<a class="album-link  tag-b b-ltxt-album b-sec-b b-tab-toy"
href="/cx/1.1184"
title="album | 5 cost">13£50</a>
</span>`enter code here`
<span class="separator">|</span>
</div>
HaveNoDisplayName
  • 8,291
  • 106
  • 37
  • 47
  • 2
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 Use `HtmlAgilityPack` – Tim Schmelter May 04 '15 at 10:30

1 Answers1

1

I would really advise against using regex to parse HTML. Instead use HtmlAgilityPack.

Then it's simple and safe:

Dim html As String = File.ReadAllText("C:\Temp\html.txt") ' i've used this text file for your input
Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(html)
Dim aHref As HtmlAgilityPack.HtmlNode = doc.DocumentNode.SelectSingleNode("//a[@class='album-link  tag-b b-ltxt-album b-sec-b b-tab-toy']")
If aHref IsNot Nothing Then
    Dim href As String = aHref.GetAttributeValue("href", "") ' /cx/1.1184
    Dim title As String = aHref.GetAttributeValue("title", "")
    Dim costs As String = title.Split("|"c).Last().Trim()    ' 5 cost
End If
Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • that looks great code, unfortunately no use to me as I tried to install htmlagiltypack, before trying the regex, just coudnt figure how to install it – user4858969 May 04 '15 at 10:54
  • You don't need to "install" it, just add the reference to the downloaded dll in your project's references. Maybe you need to make it visible by selecting your project and then at the top click "show all files". http://stackoverflow.com/questions/4958483/how-to-install-html-agility-pack-in-my-c-sharp-project – Tim Schmelter May 04 '15 at 10:59
  • 1
    @user4858969 You can manually add a reference *or* you can follow the very simple instructions at [http://www.nuget.org/packages/HtmlAgilityPack](http://www.nuget.org/packages/HtmlAgilityPack). – Andrew Morton May 04 '15 at 13:37