-1

I am quite new to programming. I need to extract the data from this HTML page: http://www.bmreports.com/servlet/com.logica.neta.bwp_MarketIndexServlet?displayCsv=false

I need to get the data updated every 30mins or so. Since, this page already has link to extract the current data in csv, i was hoping it might be possible to capture this information in csv using either C#, VB.Net or some VB Script.

I was able to pull the data on excel so thinking VB script might be possible.

Would appreciate any guidance on how I can pull this information in CSV format using any of the 3- C#, VB.NET, VB Script.

Thank, J

2 Answers2

0

This activity is known as "web scraping". Here's a way in C# to either download the file or save the string contents in a variable:

using System.Net;

using (WebClient client = new WebClient ())
{
    // save web page source directly to disk
    client.DownloadFile("http://example.com/page.html", @"C:\page.html");
    // or save only to memory
    string html = client.DownloadString("http://example.com/page.html");

    // do post-processing here
}

The more difficult part will be handling/parsing the wide variety of HTML opening and closing tags, which is not an easy task in many cases. However, you may be in luck as I see your supplied querystring contains the option displayCsv=false. I would try setting that to displayCsv=true instead. The data should be displayed in a CSV format which would be much easier to parse.

If your situation with the HTML tags is too difficult, take a look at this answer for possible C# libraries or open-source projects for web scraping—but you'll need to check the licenses for any restrictions.

Community
  • 1
  • 1
Special Sauce
  • 5,338
  • 2
  • 27
  • 31
0

Here is working example in VB to read and parse table from your html page:

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

    Dim web As New WebBrowser
    AddHandler web.DocumentCompleted, New WebBrowserDocumentCompletedEventHandler(AddressOf webtocsv)
    web.Navigate(New System.Uri("http://www.bmreports.com/servlet/com.logica.neta.bwp_MarketIndexServlet?displayCsv=false#"))

End Sub


Private Sub webtocsv(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)

    Dim webcsv As WebBrowser = CType(sender, WebBrowser)

    Dim tblrows As HtmlElementCollection
    Dim tblcols As HtmlElementCollection
    Dim column As String = ""
    Dim csv As String = ""

    tblrows = webcsv.Document.GetElementsByTagName("TABLE").Item(1).GetElementsByTagName("TR")

    For r As Integer = 0 To tblrows.Count - 1
        tblcols = tblrows.Item(r).GetElementsByTagName("TD")
        For x As Integer = 0 To 4
            column = tblcols.Item(x).InnerHtml
            csv = csv + column
            If (x < 4) Then csv = csv + ";"
        Next
        csv = csv + vbCrLf
    Next
    TextBox1.Text = csv     'show csv in textbox

End Sub


In csv variable you have formatted data columns separated by ;. If you don't want headers, set For r As Integer = to 1.

c4pricorn
  • 3,471
  • 1
  • 11
  • 12