-1
<div class="gs-bidi-start-align gs-visibleUrl gs-visibleUrl-long" dir="ltr" style="word-break:break-all;">pastebin.com/N8VKGxR9</div>

If I have this, how can I extract only the pastebin url portion in VB.net using regex? I've downloaded the entire webpage using WC.DownloadString().

Zach Z
  • 83
  • 1
  • 6

2 Answers2

0
 Dim text As String = "<div class=""gs-bidi-start-align gs-visibleUrl gs-visibleUrl-long"" dir=""ltr"" style=""word-break:break-all;"">pastebin.com/N8VKGxR9</div>"
 Dim pattern As String = "<div[\w\W]+gs-bidi-start-align gs-visibleUrl gs-visibleUrl-long.*>(.*)<\/div>"
 Dim m As Match = r.Match(text)
 Dim g as Group = m.Groups(1)

Will give you pastebin.com/N8VKGxR9

BTW: Topic in the comments for matching special tags, not the text between tags itself. So it's pretty possible.

Edited to keep only divs with these classes

SouXin
  • 1,565
  • 11
  • 17
0

If you use an HTML parser like HtmlAgilityPack (Getting Started With HTML Agility Pack), you can do something like this:

Option Infer On
Option Strict On

Imports HtmlAgilityPack

Module Module1

    Sub Main()
        ' some test data...
        Dim s = "<div class=""gs-bidi-start-align gs-visibleUrl gs-visibleUrl-Long"" dir=""ltr"" style=""word-break:break-all;"">pastebin.com/N8VKGxR9</div>"
        s &= "<div class=""gs-bidi-start-align gs-visibleUrl gs-visibleUrl-Long"" dir=""ltr"" style=""word-break:break-all;"">pastebin.com/ABC</div>"
        s &= "<div class=""WRONGCLASS gs-bidi-start-align gs-visibleUrl gs-visibleUrl-Long"" dir=""ltr"" style=""word-break:break-all;"">pastebin.com/N8VKGxR9</div>"

        Dim doc As New HtmlDocument
        doc.LoadHtml(s)

        ' match the classes string /exactly/:
        Dim wantedNodes = doc.DocumentNode.SelectNodes("//div[@class='gs-bidi-start-align gs-visibleUrl gs-visibleUrl-Long']")

        ' An alternative for if you want the divs with /at least/ those classes:
        'Dim wantedNodes = doc.DocumentNode.SelectNodes("//div[contains(@class, 'gs-bidi-start-align') and contains(@class, 'gs-visibleUrl') and contains(@class, 'gs-visibleUrl-Long')]")

        ' show the resultant data:
        If wantedNodes IsNot Nothing Then
            For Each n In wantedNodes
                Console.WriteLine(n.InnerHtml)
            Next
        End If

        Console.ReadLine()

    End Sub

End Module

Outputs:

pastebin.com/N8VKGxR9
pastebin.com/ABC

HTML parsers have the advantage that they will generally tolerate malformed HTML - for example, the test data shown above is not a valid HTML document and yet the desired data is parsed from it successfully.

Andrew Morton
  • 24,203
  • 9
  • 60
  • 84