5

I'd like to use the MSHTML library to parse some HTML that I have in a string variable. However, I can't figure out how to do this. I can easily parse the contents of a webpage given a known URL, but not the source HTML directly. Is this possible? If so, how?

Public Sub ParseHTML(sHTML As String)
Dim oHTML As New HTMLDocument, oDoc As HTMLDocument

    'This works:'
    Set oDoc = oHTML.createDocumentFromUrl("http://www.google.com", "")

    'I would like to do the following but no such method actually exists:'
    Set oDoc = oHTML.createDocumentFromString(sHTML)

    ....
    'Parse the HTML using the oDoc variable'
    ....
mwolfe02
  • 23,787
  • 9
  • 91
  • 161

3 Answers3

17

You can;

Dim odoc As Object

Set odoc = CreateObject("htmlfile") '// late binding

'// or:
'// Set odoc = New HTMLDocument 
'// for early binding

odoc.open
odoc.write "<p> In his house at R'lyeh, dead <b>Cthulhu</b> waits dreaming</p>"
odoc.Close
MsgBox odoc.body.outerHTML
Alex K.
  • 171,639
  • 30
  • 264
  • 288
  • 4
    Nice! Note to others: I received a compile error in VBA when I tried to declare `odoc As HTMLDocument`: *Compile error: Function or interface marked as restricted, or the function uses an Automation type not supported in Visual Basic*. Changing the declaration to `odoc As Object` (as this answer clearly shows) fixed the problem. – mwolfe02 Apr 03 '12 at 15:14
  • 1
    Yep, I agree, nice is the word. – Fionnuala Apr 03 '12 at 17:09
  • @Alex: Hope you don't mind, but I edited your answer to include a way to ref the library late-bound. It's non-obvious and took me some time to find via the web. – mwolfe02 Apr 03 '12 at 19:25
  • Early vs late binding has nothing to do with the way class is instantitated. It's the `Dim`-ensioning part that is import. `Dim ... As Object` is late binding, `Dim ... As ClassOrInterface` is early binding. – wqw Apr 04 '12 at 15:54
  • Worked, thanks! Note that early binding caused an error for me: [Function or interface marked as restricted.](https://v5vb.wordpress.com/2010/07/29/restricted-interfaces/) – Praesagus Feb 22 '16 at 22:21
  • If you want to keep the early binding, you can call the restricted `Write` method using `CallByName`, like so: `CallByName odoc, "Write", vbMethod, "

    In his house at R'lyeh, dead Cthulhu waits dreaming

    "`
    – Kirill Tkachenko Nov 01 '21 at 14:00
2

For straight HTML code such as Access-Rich-Text this does it:

Dim HTMLDoc As New HTMLDocument

HTMLDoc.Body.innerHTML = strHTMLText
user3305711
  • 441
  • 5
  • 16
1

This is a much better example. You will not get a null exception, nor late binding.

(And if you use WPF, just add System.Windows.Forms in your reference.)

Dim a As Object
        a = New mshtml.HTMLDocument

        a.open()
        a.writeln(code)
        a.close()

        Do Until a.readyState = "complete"
            System.Windows.Forms.Application.DoEvents()
        Loop


        Dim doc As mshtml.HTMLDocument = a



        Dim b As mshtml.HTMLSelectElement = doc.getElementsByTagName("Select").item("lang", 0)
Don Cruickshank
  • 5,641
  • 6
  • 48
  • 48
bboyse
  • 19
  • 2