0

i currently have a program that goes and access's a website and brings back the html source code after removing some Unnecessary tags. i am trying to bring back specific data points from the website however as it is a table they all have the same tag and i cant specifically pick a point this is my current code

Imports System.Text
Imports System.Net
Imports System.IO
Imports System.Text.RegularExpressions
Public Class PJMain1
Private Sub Scrape()

    Try

        Dim strURL As String = "http://www.bom.gov.au/cgi-bin/wrap_fwo.pl?
        IDV60154.html"
        Dim strOutput As String = ""
        Dim wrResponse As WebResponse
        Dim wrRequest As WebRequest = HttpWebRequest.Create(strURL)

        TxtOutput.Text = "Extracting..." & Environment.NewLine

        wrResponse = wrRequest.GetResponse()

        Using sr As New StreamReader(wrResponse.GetResponseStream())
            strOutput = sr.ReadToEnd()
            sr.Close() ' Close and clean up the StreamReader
        End Using
        TxtOutput.Text = strOutput

        'The Formatting Techniques 

        strOutput = Regex.Replace(strOutput, "<!(.|\s)*?>", "") ' Remove 
        Doctype ( HTML 5 )
        strOutput = Regex.Replace(strOutput, "</?[a-z][a-z0-9]*[^<>]*>", "") 
        ' Remove HTML Tags
        strOutput = Regex.Replace(strOutput, "<!--(.|\s)*?-->", "") ' Remove 
        HTML Comments
        strOutput = Regex.Replace(strOutput, "<script.*?</script>", "", 
        RegexOptions.Singleline Or RegexOptions.IgnoreCase)  ' Remove Script 
        Tags
        strOutput = Regex.Replace(strOutput, "<style.*?</style>", "", 
        RegexOptions.Singleline Or RegexOptions.IgnoreCase) ' Remove 
        Stylesheets
        TxtOutput.Text = strOutput 'write Formatted Output To Separate TB

        Catch ex As Exception
        Console.WriteLine(ex.Message, "Error")

         End Try

        End Sub

       Private Sub BtnExtract_Click(sender As Object, e As EventArgs) 
       Handles 
        BtnExtract.Click
       Scrape() 'Scrape Text From URL
      End Sub

it brings back the html and removes some unnecessary tags but i dont know how to pin point it on bringing back the data i want/need

this is what i believe is a whole data set off the website when i bring it back:

<a href="/fwo/IDV67204/IDV67204.586220.tbl.shtml">Table</a></td></tr><tr><!-- METADATA,086347,1,3.00,4.50,6.50,0.00,0.00,IDV67204,,4,Yarra R at Warrandyte, --><td>Yarra R at Warrandyte </td><td> 6.26am Tue</td><td> 0.59 </td><td>steady </td><td>below minor</td> <td> <a href="/fwo/IDV67204/IDV67204.086347.plt.shtml">Plot</a>

anything would help... cheers

Josh Wyss
  • 1
  • 2

0 Answers0