2

I want to open a local HTML file and store it as an HTMLDocument so I can scrape it into excel. However all the available information is for html pages on the web. So for instance this code works great for www.bbc.co.uk but doesn't work for a local file:

Sub queryXMLlocal()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

Debug.Print Application.ActiveWorkbook.Path

XMLPage.Open "GET", "<filepath>\KOND.html", False
XMLPage.send

If XMLPage.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLPage.Status & " - " & XMLPage.statusText
Exit Sub
End If

End Sub

Alternatively using

Sub GetHTMLDocument()

Dim IE As New SHDocVw.internetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument


IE.Visible = True
IE.navigate "https://www.bbc.co.uk/"

Do While IE.readyState <> READYSTATE_COMPLETE
Loop

    ' Wait while IE loading...

Set HTMLDoc = IE.Document
end sub

works but when I use a local file I get the error:

"object invoked has disconnected from its client"

Can I just use HTMLdocument.open? Although I cannot get this to work either.

Relisora
  • 1,163
  • 1
  • 15
  • 33
tom s
  • 21
  • 2
  • 2
    If it's not being served by an web server, what's going to respond to an HTTP request? Along those same lines, why would you need to "scrape" it? Are you storing data as HTML? – Comintern Oct 09 '18 at 14:19
  • 3
    You will need to load the local file into a string then you can write that into a new `HTMLDocument`. See [MSHTML: CreateDocumentFromString instead of CreateDocumentFromUrl](https://stackoverflow.com/questions/9995257/mshtml-createdocumentfromstring-instead-of-createdocumentfromurl) – Alex K. Oct 09 '18 at 14:25
  • 1
    Alex K. Thanks so much. This is my final code and it seems to work: Sub loadLocalfileAsString() Dim myFile As String Dim text As String Dim textline As String Dim odoc As Object myFile = "filepath\KOND.html" Open myFile For Input As #1 Do Until EOF(1) Line Input #1, textline text = text & textline Loop Close #1 Set odoc = New HTMLDocument odoc.Open odoc.Write text odoc.Close End Sub – tom s Oct 09 '18 at 15:31
  • Comintern, Yes bizarrely I am getting an HTML file in from some ftp process and I need to get data out of if and put into excel. – tom s Oct 09 '18 at 15:32

1 Answers1

2

This is the function I usually use:

Public Function GetHTMLFileContent(ByVal filePath As String) As HTMLDocument
    Dim fso As Object, hFile As Object, hString As String, html As New HTMLDocument
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set hFile = fso.OpenTextFile(filePath)

    Do Until hFile.AtEndOfStream
        hString = hFile.ReadAll()
    Loop

    html.body.innerHTML = hString
    Set GetHTMLFileContent = html
End Function
QHarr
  • 83,427
  • 12
  • 54
  • 101