2

Today I have a big problem and I have been looking at many different methods over a few hours and none of them are working for me.

I have tried

  1. Get Between Functions
  2. Regex
  3. HTML Agility Pack

The problem is in VB.NET and I want to grab out the title of a film and ignore the html link before it. But the problem is I cant because the link changes for every title, and I do not understand Regex to create the code for it.

Here is the code and the part which says Movie Link 1 is what I want to grab.

<a href="/download/fast-and-furious-7-2015-hd-ts-xvid-ac3-hq-hive-cm8-t10472303.html" class="cellMainLink">**Movie Link 1**</a>

and of course there is other titles I need to grab too. So the code I got for it is this and it is not working.

Dim r As New System.Text.RegularExpressions.Regex("class=""cellMainLink"">(?<name>.*)</a>")
    Dim matches As MatchCollection = r.Matches(rssourcecode)


    For Each itemcode As Match In matches
        ListBox1.Items.Add(itemcode.Groups(2).Value)
    Next

To anyone who can help me please get back to me as soon as possible.

Thank you.

Jens
  • 6,275
  • 2
  • 25
  • 51
  • [You can't parse HMTL with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – The Blue Dog Apr 27 '15 at 18:35
  • I removed the tags from your title: http://meta.stackexchange.com/questions/19190/should-questions-include-tags-in-their-titles – Jens Apr 27 '15 at 18:38

1 Answers1

0

Using HTML Agility Pack you can use this code:

Dim links As New List(Of String)()
Dim htmlDoc As New HtmlAgilityPack.HtmlDocument()
htmlDoc.LoadHtml(WebSource)
For Each link As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//a[@class]")
    Dim att As HtmlAttribute = link.Attributes("class")
    If att.Value = "cellMainLink" Then
        links.Add(link.Value)
    End If
Next

I do not believe you need a regex solution here. However, just for the educational purpose:

Dim ptrn As String = "<a\b[^>]*?class=[""']?cellMainLink[""']?[^>]*?>(.*?)</a>"
Dim input As String = "<a href=""/download/fast-and-furious-7-2015-hd-ts-xvid-ac3-hq-hive-cm8-t10472303.html"" class=""cellMainLink"">**Movie Link 1**</a>"
Dim dds As List(Of String) = New List(Of String)
Dim rx As Regex = New Regex(ptrn)
Dim result As String = rx.Match(input).Groups(1).Value

Result: **Movie Link 1**

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563