0

This is what the html rendered code from the webpage looks like

<div class="mygallery_entry">
<div class="mygallery_inner">
<a title="img1" class="gallery_image" href="http://image.com/29.html"><img src="/mini/1.jpg" alt="" height="208" width="333" border="0"></a>
</div>
<div class="mygallery_inner">
<a title="img2" class="gallery_image" href="http://image.com/12.html"><img src="/mini/2.jpg" alt="" height="208" width="333" border="0"></a>
</div>
<div class="mygallery_inner">
<a title="img3" class="gallery_image" href="http://image.com/59.html"><img src="/mini/3.jpg" alt="" height="208" width="333" border="0"></a>
</div>
</div>

My output goes into a listbox and it should look like this:

http://image.com/29.html
http://image.com/12.html
http://image.com/59.html

1 Answers1

0

There are several ways to extract information from xml or html. If the html is a valid xml you can use LINQ-to-XML with XPath query or LINQ query syntax get particular information. Otherwise, if the html is not a valid XML and cannot be parsed/loaded to XDocument, you should look into Html Agility Pack. Below is an example using XPath query to get those three image links (html page need to be downloaded first and stored either as file or as string).

Imports System.Xml.XPath
....
Dim doc = XDocument.Parse(htmlString)
'if you want to load from html file instead of string, use XDocument.Load as follow
'Dim doc = XDocument.Load(pathToHtmlFile)
Dim list = New List(Of String)()
For Each a As XElement In doc.XPathSelectElements("//div[@class='mygallery_inner']/a[@href]")
    list.Add(a.Attribute("href").Value)
Next

In the end you'll get all link from html page in list variable, ready to be displayed in anyway you want. XPath query expression above means (read from right to left) :

  1. /a[@href] : select element <a> having href attribute and is a direct child of..
  2. //div[@class='mygallery_inner'] : a <div> element having class attribute value = mygallery_inner and is descendant of root element (not necessarily direct child)
Community
  • 1
  • 1
har07
  • 88,338
  • 12
  • 84
  • 137