1

I am relatively new to programming and have written a web-scraper in VBA that I am trying to recreate it in VB.Net on visual studio. I am using the same object (mshtml.HTMLDocument) that I was using in vba, but for some reason in visual studio it seems to be missing the .getElementsByClassName method which is essential for my program. I just don't understand why it would be missing in VB.net on Visual studio if I am using the same reference library and same object that I was using when creating in in VBA.

IS there something I am doing wrong?

VBA Intellisense & Reference Library

Visual Studio VB.Net Intellisense, Reference Library, & Error

Devan382
  • 41
  • 1
  • 1
  • @ TnTinMin I don’t see IHMTLDocument6 or 7 as an option in Visual Studio. It only appears to goes up to IHTMLDocument5. However, when I look at it in VBA it does show IHMTLDocument6 and IHMTLDocument7. – Devan382 Jul 29 '17 at 23:49
  • Did you read, **understand** and implement the answer I pointed to? You need to create an interop assembly based on the type library for the current version of MsHtml installed on your computer and then reference that interop library; not the one in the GAC. – TnTinMn Jul 30 '17 at 01:23
  • Sorry, I misunderstood what you said at first. I just did what it said in the link and it is working now. Thanks! – Devan382 Jul 30 '17 at 01:54

1 Answers1

0

A System.Windows.Forms.HtmlDocument (in VB.NET) is not an mshtml.HtmlDocument (in VBA). Without seeing the relevant code, I can't be sure that you haven't ended up with the former.

Rather than going through extra steps to get the latter, you can write your own method for getting elements with a particular class name, e.g.

Public Class Form1

    Dim wb As WebBrowser

    Function GetElementsHavingClassName(doc As HtmlDocument, className As String) As List(Of HtmlElement)
        Dim elems As New List(Of HtmlElement)

        For Each elem As HtmlElement In doc.All
            Dim classes = elem.GetAttribute("className")
            If classes.Split(" "c).Any(Function(c) c = className) Then
                elems.Add(elem)
            End If
        Next

        Return elems

    End Function

    Sub ExtractElements(sender As Object, e As WebBrowserDocumentCompletedEventArgs)
        Dim wb = DirectCast(sender, WebBrowser)
        Dim flintstones = GetElementsHavingClassName(wb.Document, "flintstone")

        If flintstones.Count > 0 Then
            For Each fs In flintstones
                ' do something with the element
                TextBox1.AppendText(fs.InnerText & vbCrLf)
            Next
        Else
            TextBox1.Text = "Not found."
        End If

    End Sub

    Sub DoStuff()
        If wb Is Nothing Then
            wb = New WebBrowser
        End If

        RemoveHandler wb.DocumentCompleted, AddressOf ExtractElements ' don't leave any old ones lying around
        AddHandler wb.DocumentCompleted, AddressOf ExtractElements

        Dim loc = "file:///c:\temp\somehtml.html"

        Try
            wb.Navigate(loc)
        Catch ex As Exception
            'TODO: handle the problem gracefully.
            MsgBox(ex.Message)
        End Try

    End Sub

    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        DoStuff()

    End Sub

    Private Sub Form1_FormClosing(sender As Object, e As FormClosingEventArgs) Handles MyBase.FormClosing
        If wb IsNot Nothing Then
            RemoveHandler wb.DocumentCompleted, AddressOf ExtractElements
            wb.Dispose()
        End If

    End Sub

End Class

Which, given the HTML

<!DOCTYPE html>
<html>
<head><title></title></head>
<body>
<div class="fred flintstone">Fred</div>
<div class="wilma flintstone">Wilma</div>
<div class="not-a-flintstone">Barney</div>
</body>
</html>

outputs

Fred
Wilma

Andrew Morton
  • 24,203
  • 9
  • 60
  • 84
  • I declared it with Dim HtmlDoc As New mshtml.HTMLDocument which is the same as how I declared it in the VBA code and they are both using the same library. Also, wouldn't the custom function end up being slower? – Devan382 Jul 29 '17 at 22:26
  • @Devan382 The duplicate suggested by TnTinMn appears to explain what to do if you have no choice, but note that it will be a fragile solution. The custom method will not necessarily be slower - and there are likely to be other parts of the program that can be sped up to more than make up for it. For example, you might not even need a browser to render the page with all the time that takes and could instead use [HtmlAgilityPack](https://www.nuget.org/packages/HtmlAgilityPack). However, I suggest that you first get it working and then see if any improvements are actually needed ;) – Andrew Morton Jul 29 '17 at 22:50
  • I just looked into it what TntinMin suggested and I don’t see IHMTLDocument6 or 7 as an option in Visual Studio. It only appears to goes up to IHTMLDocument5. However, when I look at it in VBA it does show IHMTLDocument6 and IHMTLDocument7. – Devan382 Jul 29 '17 at 23:51
  • If I have to I will use the custom function you gave, but I want to understand why is the MSHTML showing different objects/methods in VBA vs in Visual Studio? – Devan382 Jul 29 '17 at 23:51
  • I am not using a browser or rendering the page. I am using MSXML2.ServerXMLHTTP60 to make requests. I am not familiar with the HtmlAgilityPack, would that be even faster than what I am currently using? – Devan382 Jul 29 '17 at 23:51
  • @Devan382 Rendering the page includes running any JavaScript that creates elements, and not limited to rendering it to something that is displayed. Only you can determine which scenario is fastest for your purposes - by *testing*. – Andrew Morton Jul 30 '17 at 17:13