0

I got this code from a competent user, not sure if he wants to be named. The code searches the HTML content for innerText of certain tags and transfers them to an Excel table, well sorted under the headers, structured as pivot.

Public Sub GetDataFromURL()
    Const URL = "URL"
    Dim html As MSHTML.HTMLDocument, xhr As Object

    Set html = New MSHTML.HTMLDocument
    Set xhr = CreateObject("MSXML2.ServerXMLHTTP.6.0")

    With xhr
        .Open "POST", URL, False
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .send "...parameters..."
        html.body.innerHTML = .responseText
    End With

    Dim table As MSHTML.HTMLTable, r As Long, c As Long, headers(), row As MSHTML.HTMLTableRow
    Dim results() As Variant, html2 As MSHTML.HTMLDocument

    headers = Array("HDR01", "HDR02", "HDR03", "HDR04")

    ReDim results(1 To 100, 1 To UBound(headers) + 1)

    Set table = html.querySelector("table")
    Set html2 = New MSHTML.HTMLDocument

    Dim lastRow As Boolean

    For Each row In table.Rows
        lastRow = False
        Dim header As String

        html2.body.innerHTML = row.innerHTML
        header = Trim$(row.Children(0).innerText)

        If header = "HDR01" Then
            r = r + 1
            Dim dict As Scripting.Dictionary: Set dict = GetBlankDictionary(headers)
            On Error Resume Next
            dict("HDR02") = Replace$(html2.querySelector("a").href, "about:", "https://URL")
            On Error GoTo 0
        End If

        If dict.Exists(header) Then dict(header) = Trim$(row.Children(1).innerText)

        If (header = vbNullString And html2.querySelectorAll("a").Length > 0) Then
            dict("HDR03") = Replace$(html2.querySelector("a").href, "about:blank", "URL")
            lastRow = True
        ElseIf header = "HDR04" Then
            If row.NextSibling.NodeType = 1 Then lastRow = True
        End If

        If lastRow Then
            populateArrayFromDict dict, results, r
        End If
    Next

    results = Application.Transpose(results)
    ReDim Preserve results(1 To UBound(headers) + 1, 1 To r)
    results = Application.Transpose(results)
    
    Dim re As Object
    
    Set re = CreateObject("VBScript.RegExp")
    
    With re
        .Global = False
        .MultiLine = False
        .IgnoreCase = True
        .Pattern = "\s([0-9.]+)\sm²"
    End With

    Dim ie As SHDocVw.InternetExplorer
    
    Set ie = New SHDocVw.InternetExplorer
    
    With ie
        .Visible = True
        
        For r = LBound(results, 1) To UBound(results, 1)
            
            If results(r, 7) <> vbNullString Then
                
                .Navigate2 results(r, 7), headers:="Referer: " & URL
                
                While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
 
                'On Error Resume Next
                results(r, 8) = re.Execute(.document.querySelector("#anz").innerHTML)(0).Submatches(0)
                'On Error GoTo 0
   
            End If
            
        Next
        
        .Quit
        
    End With
    
    With ActiveSheet
        .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
    End With

End Sub

It works perfectly in Excel, but I need it for an Access-table. My Aceess-table named tblTab01 contains all the fields that are present in the code in the headers = array("..."), and I have disabled the following lines in the code:

results = Application.Transpose(results)

and

ActiveSheet.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers

Instead, I added the following lines:

Dim db As DAO.Database
Dim strInsert
Set db = CurrentDb
strInsert = "INSERT INTO tblTab01 VALUES (results);"
db.Execute strInsert

But I only get all possible errors!

How would the code need to be modified for use with the Access table? THX

Jasco
  • 225
  • 3
  • 8
  • 2
    They have completely different object models, it requires a substantial rewrite. Also unless you know exactly what you are receiving from the html, you [cannot parse](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) it using regex. – Warcupine Apr 19 '21 at 18:14
  • This code is my only rock in the surf! "Substantial rewrite" sounds evil and depresses me ;-( – Jasco Apr 19 '21 at 18:16
  • I just found this link: https://stackoverflow.com/questions/11275356/vba-procedure-to-import-csv-file-into-access. Is perhaps a detour possible: To save the results as a .csv file and later transfer it to Access and the values to the table? If yes, how to save it as a .csv? – Jasco Apr 19 '21 at 18:29
  • 1
    Whilst saving as CSV and importing into Access might work, you are just making a huge mess for the future. Far better to step back, start in Access (if that is where the data will reside) and split it into small steps. Start by getting the web page, then parsing out the data that you need, and finally adding it to an Access table. If you get stuck, please post the code & data that you have, the expected outcome, and explain the issue properly. – Applecore Apr 19 '21 at 18:46
  • Thanks a lot for the kind replies. Accessing this particular website is quite tricky, it needs referers, values need to be rewritten, queryselector does not work easily etc! The above code includes 2 extra functions that accomplish all this. I was hoping I could leave it like this and in the end instead of `ActiveSheet.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results` use a detour to use it with Access. Since it seems that a rewrite would be necessary I think that saving the results as csv and import to Access is probably the only solution, just dont't know how?! – Jasco Apr 19 '21 at 19:03
  • 1
    Depending on structure, Access can import/link to HTML document. If you use VALUES clause then need to specify fields to insert into. However, AFAIK, can't use an array object like that in INSERT action. – June7 Apr 19 '21 at 19:10
  • The code content and structure is beyond my pay grade, so the question is: Is this VBA code entirely Excel based or can the results, because they are an array, be saved as .csv, .txt or any other format applicable to Access, to import into an Access table afterwards, without changing the code itself? – Jasco Apr 19 '21 at 19:17
  • 1
    Reading HTML and extracting data should be basically same as in Excel VBA (RegEx is available in Access VBA, assuming RegEx is appropriate). VBA writing data to tables can be done couple ways: 1) open recordset and add records to recordset object or 2) INSERT action SQL. Loop through array object to write its elements to Access table. – June7 Apr 19 '21 at 19:20
  • **INSERT** is what I tried first: `Set db = CurrentDb` `strInsert = "INSERT INTO tblTab01 VALUES (results);"` `db.Execute strInsert`. It did not work, so I have opened this thread! Is there an error in my syntax? @Warcupine wrote:Acces and Excel have completely different object models, it requires a substantial rewrite...! – Jasco Apr 19 '21 at 19:25
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/231338/discussion-between-june7-and-jasco). – June7 Apr 19 '21 at 19:25
  • 1
    if you write to csv from Excel you can just import from Access (or even just from Excel). Rather than worrying about any re-writes. Pretty sure it is a few button clicks in Access. – QHarr Apr 19 '21 at 19:56
  • @Qharr: Hello Mr genius :-) I can't write to csv from Exce first, because on the computer where this code is used there is no Excel/Office, only an Access Runtime – Jasco Apr 19 '21 at 19:59
  • 1
    Well that code should be easily transferable to Access and then write the array to table perhaps as here: https://www.tek-tips.com/viewthread.cfm?qid=1425555 – QHarr Apr 19 '21 at 20:06
  • Code-rewrite was not necessary, The addition `Set rs = db.OpenRecordset("TABLE01", dbOpenDynaset)` `For r = ... To ...` `With rs` `.AddNew` `.Fields("HDR01") = results(r, 1)` etc works perfectly, but I can't figure out how to set the actual number of rows. Can't get any further with value of "To" = `For r = LBound(results, 1) To ...???` – Jasco Apr 20 '21 at 06:22

1 Answers1

1

This produces same output as the Excel code. I attempted a solution that eliminated looping array but this version is actually faster.

Had to use Excel WorksheetFunction to make the Transpose method work. Make sure Excel library is selected in References.

    results = Excel.WorksheetFunction.Transpose(results)
    ReDim Preserve results(1 To UBound(headers) + 1, 1 To r)
    results = Excel.WorksheetFunction.Transpose(results)

Uncomment the On Error lines:

On Error Resume Next
results(r, 8) = re.Execute(.document.querySelector("#anz").innerHTML)(0).Submatches(0)
On Error GoTo 0

Then instead of the With ActiveSheet block, loop through array.

    Dim db As DAO.Database
    Dim rs As DAO.Recordset
    CurrentDb.Execute "DELETE * FROM tblNetzPortDwnLd"
    Set db = CurrentDb
    Set rs = db.OpenRecordset("tblNetzPortDwnLd", dbOpenDynaset)
    For r = LBound(results, 1) To UBound(results, 1)
        With rs
            .AddNew
            .Fields("zpID") = r
            .Fields("zpAktenzeichen") = results(r, 1)
            .Fields("zpAmtsgericht") = results(r, 2)
            .Fields("zpObjekt") = results(r, 3)
            .Fields("zpVerkehrswert") = results(r, 4)
            .Fields("zpTermin") = results(r, 5)
            .Fields("zpPdfLink") = results(r, 6)
            .Fields("zpAdditLink") = results(r, 7)
            .Fields("zpm2") = results(r, 8)
            .Update
        End With
    Next

All fields in table are text type, per our chat discussion.

June7
  • 19,874
  • 8
  • 24
  • 34
  • Another **brilliant user & and dear friend**. Thank you for the effort. There remains only one small problem: The Access-table has an additional field (primary key) called zpID. Therefore the code generates an error at `rs.update`, how can I include the entry of a running number for this field in the code? – Jasco Apr 20 '21 at 08:08
  • 1
    I used autonumber type for zpID field. This will automatically create a 'running number'. No need to set value with code. This is why zpID is not in the headers array and not in dictionary. The procedure works. – June7 Apr 20 '21 at 08:30
  • 1
    And also why it is not in the populateTableFromDict procedure. – June7 Apr 20 '21 at 08:38
  • Doesn't work, Error 3314: **You must enter a value in the field tblNetzPortDwnLd.zpID** and table remains empty". `rs.update` ist is marked yellow... – Jasco Apr 20 '21 at 08:43
  • 1
    It works for me. Did you make the field an autonumber type? – June7 Apr 20 '21 at 08:44
  • Your Access is kind, mine is evil :-) Fieldname is zpID = z.p.I.D. Field is Text-type, it must be so (long story) – Jasco Apr 20 '21 at 08:45
  • 1
    Having a 'running number' in a text field makes no sense. Why can't you just use the autonumber type? – June7 Apr 20 '21 at 08:53
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/231362/discussion-between-june7-and-jasco). – June7 Apr 20 '21 at 08:55
  • PERFECT. THANKS! – Jasco Apr 20 '21 at 09:18