0

I am having trouble using VBA to get data from a website... This method generate bugs coz the html object does not support getelementbyclassname. May I know is there any alternative way to get the data I want?

enter image description here

Option Explicit

Public Sub getlist()
[a:b].ClearContents 'clear all contents

   Dim strurl As String
   strurl = "https://www.bloomberg.com/quote/HSB22A2:LX"
   Dim html As Object: Set html = CreateObject("htmlfile")
   With CreateObject("msxml2.xmlhttp")
       .Open "get", strurl, False
       .send
       Do While .readyState <> 4
           DoEvents
       Loop
       'html.body.innerHtml = .responseText
   End With

   With html.getElementsByClassName("fieldLabel__9f45bef7")(0) '-->generate bugs
        'MsgBox (.innerText)

   End With

End Sub

braX
  • 11,506
  • 5
  • 20
  • 33
Mark
  • 1
  • Have you tried inserting the web page into an excel sheet (or anywhere where there is space). Then you can access parts by `sheet1!A1` syntax. So new sheet **Alt + D, D, W** and follow the prompts. –  Dec 10 '19 at 02:42
  • Yeah I tried this way it does crawl data from the website but seems not applicable for my context :) Many thanks. – Mark Dec 10 '19 at 03:17

2 Answers2

0

I also struggled to scrape webpages with .getElementsByClassName, and all but gave up on it until I recently discovered that it won't work if you dimension the container 'HTML' object as an object. It does, however, work if you:

  1. add a reference to the "Microsoft HTML Object Library" (Menu "Tools" -> "References"); and
  2. declare the container object as an HTMLDocument, and then instantiate it.

I have rewritten your code as below:

Option Explicit

Public Sub getlist()

[a:b].ClearContents 'clear all contents

   Dim strurl As String
   strurl = "https://www.bloomberg.com/quote/HSB22A2:LX"
   Dim html As HTMLDocument: Set html = New HTMLDocument
   With CreateObject("msxml2.xmlhttp")
       .Open "get", strurl, False
       .send
       Do While .readyState <> 4
           DoEvents
       Loop
       html.body.innerHtml = .responseText
   End With

   With html.getElementsByClassName("fieldLabel__9f45bef7")(0) '-->generate bugs
       MsgBox .innerText
   End With

End Sub

I would just add that I uncommented two lines - namely, where the code is transferred to the HTML container object from .responseText (I think you had commented the line out by mistake?). Anyway, I tested it and it appears to work. Hopefully that solves the problem for you.

  • Many thanks for the answer and it does solve my problem :) A further clarification for other watcher is that if wanna get the exact data field the number in `With html.getElementsByClassName("fieldLabel__9f45bef7")(0)` should be 7 instead of 0. More info on: http://automatetheweb.net/common-vba-methods-properties-web-automation/ – Mark Dec 10 '19 at 03:21
0

Not sure why you are using late bound HTMLFile. Microsoft paid a lot of money in an anti-trust settlement in order for everyone to have MSHTML/IE. Most IT departments, IMO, would allow the addition of Microsoft HTML Object Library if you can't add it in yourself. You then have access to the expanded set of methods. The limitations that come with the late bound interfaces make it rarely worth using. However, I will show you one interesting use of HTMLFile in relation to your question: that is, as a way to execute a native javascript method.

It is possible to get all the stats on the page by using regex on the .responseText to grab the EncodedURIComponent which the page is using to store that data. The page itself decodes this and then handles the JSON string exposed with a JSON parser. We can mimic these steps. Use HTMLFile to access a decodeURIComponent method on regex returned EncodedURIComponent; and then use a json parser to parse out the info we want. I use jsonconverter.bas Download raw code from there and add to standard module called JsonConverter. You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code (this is for if you import the .bas direct).


VBA:

Option Explicit

Public Sub GetData()
    Dim http As Object, s As String, ws As Worksheet, re As Object

    Set http = CreateObject("MSXML2.XMLHTTP")
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set re = CreateObject("VBScript.RegExp")

    With re
        .Global = True
        .MultiLine = True
    End With

    Dim json As Object, uriComponent As String, decodedComponent As String

    With http
        .Open "GET", "https://www.bloomberg.com/quote/HSB22A2:LX", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" 'mitigate for being served cached results
        .send
        s = .responseText
        uriComponent = GetString(re, s, "decodeURIComponent\(""(.*?\));")
        decodedComponent = GetDecodedString(uriComponent)
        Set json = JsonConverter.ParseJson(decodedComponent) 'https://jsoneditoronline.org/?id=9da2917ba22a4e65a9202f73f6165eb5
    End With

    Dim quoteInfo As Object, r As Long, key As Variant, results()

    Set quoteInfo = json("quote")
    quoteInfo("pressReleases") = vbNullString 'get rid of unwanted collection

    ReDim results(1 To quoteInfo.Count, 1 To 2)

    For Each key In quoteInfo
        r = r + 1
        If IsNull(quoteInfo(key)) Then quoteInfo(key) = vbNullString
        results(r, 1) = key: results(r, 2) = quoteInfo(key)
    Next

    With ws
        .Cells(1, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
    End With
End Sub

Public Function GetString(ByVal re As Object, ByVal s As String, ByVal p As String) As String
    With re
        .Pattern = p
        GetString = .Execute(s)(0).SubMatches(0)
    End With
End Function

Public Function GetDecodedString(ByVal encodedString As String) As String 'Adapted from  @konahn https://stackoverflow.com/questions/4998715/does-vba-have-any-built-in-url-decoding
     With CreateObject("htmlfile")
         .parentWindow.execScript "function decode(s) {return decodeURIComponent(s)}" 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent
         GetDecodedString = .parentWindow.decode(encodedString)
    End With
End Function

Sample of output:

enter image description here


Json extracted

You can explore here: https://jsoneditoronline.org/?id=9da2917ba22a4e65a9202f73f6165eb5


Explanation of regex: explore here

enter image description here

QHarr
  • 83,427
  • 12
  • 54
  • 101