1

I want to import the html from a web page and then parse it and retrieve http links from the elements. I am able to grab the html and put it in a string. Also, if I have the html in xml format, I am able to do a for each loop to retrieve the data. But I am not able to figure out how to take the html string and make it readable by LINQ. I think I'm missing some simple part here.

Sub GetTest()
        Dim source As String = "http://gd2.mlb.com/components/game/mlb/year_2018/month_03/day_29/"
        Dim Client As New WebClient
        Dim html As String = Client.DownloadString(source)

        Dim xml = XElement.Parse(html)

        Dim links = From link In xml...<a>

        For Each link In links
            MessageBox.Show(link.@href)
        Next
    End Sub
Michael T
  • 1,745
  • 5
  • 30
  • 42

1 Answers1

1

This page can be parsed as Xml after getting rid of first unclosed tag:

Dim xml = XElement.Parse(html.Substring(html.IndexOf(">") + 1))
For Each link In xml.Descendants("a")
    Console.WriteLine(link.Attribute("href"))
Next

In general there are multiple issues when trying to parse Html as if it was standard Xml. So it is better to use HtmlAgilityPack.

derloopkat
  • 6,232
  • 16
  • 38
  • 45