i currently have a program that goes and access's a website and brings back the html source code after removing some Unnecessary tags. i am trying to bring back specific data points from the website however as it is a table they all have the same tag and i cant specifically pick a point this is my current code
Imports System.Text
Imports System.Net
Imports System.IO
Imports System.Text.RegularExpressions
Public Class PJMain1
Private Sub Scrape()
Try
Dim strURL As String = "http://www.bom.gov.au/cgi-bin/wrap_fwo.pl?
IDV60154.html"
Dim strOutput As String = ""
Dim wrResponse As WebResponse
Dim wrRequest As WebRequest = HttpWebRequest.Create(strURL)
TxtOutput.Text = "Extracting..." & Environment.NewLine
wrResponse = wrRequest.GetResponse()
Using sr As New StreamReader(wrResponse.GetResponseStream())
strOutput = sr.ReadToEnd()
sr.Close() ' Close and clean up the StreamReader
End Using
TxtOutput.Text = strOutput
'The Formatting Techniques
strOutput = Regex.Replace(strOutput, "<!(.|\s)*?>", "") ' Remove
Doctype ( HTML 5 )
strOutput = Regex.Replace(strOutput, "</?[a-z][a-z0-9]*[^<>]*>", "")
' Remove HTML Tags
strOutput = Regex.Replace(strOutput, "<!--(.|\s)*?-->", "") ' Remove
HTML Comments
strOutput = Regex.Replace(strOutput, "<script.*?</script>", "",
RegexOptions.Singleline Or RegexOptions.IgnoreCase) ' Remove Script
Tags
strOutput = Regex.Replace(strOutput, "<style.*?</style>", "",
RegexOptions.Singleline Or RegexOptions.IgnoreCase) ' Remove
Stylesheets
TxtOutput.Text = strOutput 'write Formatted Output To Separate TB
Catch ex As Exception
Console.WriteLine(ex.Message, "Error")
End Try
End Sub
Private Sub BtnExtract_Click(sender As Object, e As EventArgs)
Handles
BtnExtract.Click
Scrape() 'Scrape Text From URL
End Sub
it brings back the html and removes some unnecessary tags but i dont know how to pin point it on bringing back the data i want/need
this is what i believe is a whole data set off the website when i bring it back:
<a href="/fwo/IDV67204/IDV67204.586220.tbl.shtml">Table</a></td></tr><tr><!-- METADATA,086347,1,3.00,4.50,6.50,0.00,0.00,IDV67204,,4,Yarra R at Warrandyte, --><td>Yarra R at Warrandyte </td><td> 6.26am Tue</td><td> 0.59 </td><td>steady </td><td>below minor</td> <td> <a href="/fwo/IDV67204/IDV67204.086347.plt.shtml">Plot</a>
anything would help... cheers