1

I am writing a macro in VBA for Excel that examines a page and I need to extract a variable that is set within a script in the HTML. In the macro, I create an IE object thusly:

Set objIE = CreateObject("InternetExplorer.Application")

Pages in the target website all have the following script which defines a variable called digitalData.page.pageName. Here is a snippet of HTML:

<script>
var digitalData = '';
function init() {
digitalData = {
  "user": {
    "userLoginState": "guest",
    "userCountry": "",
    "userEmployeeName": "",
    "userBirthday": "",
    "userGender": "",
    "userState": "",
    "userID": "",
    "LRUserID": "",
    "userEmployeeID": "",
    "userDWID": "",
    "userSessionId": "BYTEzHFAdLrPoPPOlTPGWvlBjCx54jjEyB8="
  },
  "page": {
    "pageName": "en_us:plp:men:clothing:Casual Shirts",
    "pageType": "plp",
    "pageGender": "men",
    "pageLocale": "us",
    "pageRedirected": "no",
    "pageJSErrorCount": "3",
    "pageLevel1": "men",
    "pageLevel2": "men/clothing",
    "pageLevel3": "men/clothing/Casual Shirts",
    "pageLevel4": "men/clothing/Casual Shirts",
    "pageHierarchy": "men/clothing/Casual Shirts"
  },

If I open a page from this site in Chrome and inspect it, I can type the variable name in the console and it will return the value but I can't seem to access the variable from IE using VBA like this:

inspectLink(i, 1) = objIE.digitalData.page.pageName

In this case, I would like to find en_us:plp:men:clothing:Casual Shirts in inspectLink(i, 1) but instead I get Runtime error '438' Object doesn't support the property or method.

Dim inspectCat(4) As String
inspectCat(0) = "webcat=men"
inspectCat(1) = "webcat=women"
inspectCat(2) = "webcat=kids"
inspectCat(3) = "webcat=baby"
inspectCat(4) = "webcat=home"

Dim targetSearchCount as Integer
Dim failedSearchCount as Integer
targetSearchCount=0
failedSearchCount=0

REM New Code - DOES NOT WORK cannot access pageName this way
REM if digitalData.page.pageName has en_us: in it, then it's our target
REM if it has failedSearchResult in it, then report to web dev team
REM syntax might be objIE.Document.digitalData.page.pageName

REM inspectLink(i, 1) = objIE.digitalData.page.pageName
REM MsgBox inspectLink(i, 1)
REM if inStr(objIE.digitalData.page.pageName, "en_us:") then targetSearchCount=targetSearchCount+1 endif
REM if inStr(objIE.digitalData.page.pageName, "failed_Search_Result") then failedSearchCount=failedSearchCount+1 endif
REM End New Code

REM Begin Old Code - WORKS BUT "dublicate" MAY NOT BE RELIABLE OVER TIME
REM 
Set pageNameDubs = objIE.Document.GetElementsByClassName("page-Name-dublicate")
'MsgBox pageNameDubs(0).Value
For Each pageName In pageNameDubs
'    If InStr(pageName.innertext, "en_us:") > 0 Then   
        inspectLink(i, 1) = pageName.Value
'    End If
Next
REM End Old Code
cxw
  • 16,685
  • 2
  • 45
  • 81
jeromekjerome
  • 501
  • 1
  • 8
  • 26
  • What's `inspectLink`? Are the objects declared? How? Can't troubleshooting without seeing the what you're seeing. Please add your VBA to the question.. See [mcve] – ashleedawg Mar 15 '18 at 01:47
  • Thanks for the tip. Done. – jeromekjerome Mar 15 '18 at 15:31
  • Jerome, have you had a chance to try out my answer? Thanks! – cxw Mar 17 '18 at 21:00
  • Sorry, I've been swamped with other problems. The "dublicate" query works for now so improving it is not at the top of the priority list. I'll certainly let you know when I try to make it work. – jeromekjerome Mar 19 '18 at 13:11

1 Answers1

1

This was an interesting one. This answer is specifically about accessing digitalData.page.pageName. In the code below, everything between VVV and ^^^ does that, based on a document that's already loaded. You can integrate that part into your existing code.

In the Developer Console, this is as simple as digitalData.page.pageName, or, equivalently, document.defaultView.digitalData.page.pageName (source). You can get document.defaultView in Excel VBA, but I can't figure out how to access JavaScript globals from that object. Instead, I did it through the DOM. The below sample works for me.

Code

First, make sure you have added references to Microsoft Internet Controls and to Microsoft HTML Object Library.

Option Explicit
Option Base 0

Public Sub GetResult()
    Dim objIE As SHDocVw.InternetExplorer
    Set objIE = New SHDocVw.InternetExplorer

    ' Load the page with the target data
    With objIE
        .navigate "http://cxw42.github.io/49290039.html?buster=1"
        ' Cache buster thanks to https://stackoverflow.com/questions/24851824/how-long-does-it-take-for-github-page-to-show-changes-after-changing-index-html#comment69647442_24871850
        ' by https://stackoverflow.com/users/185973/joel-glovier
        .Visible = True
    End With

    Do While objIE.Busy
        DoEvents
    Loop

    Dim doc As MSHTML.HTMLDocument
    Set doc = objIE.document

    ' VVVVVVVVVVVVVV

    Dim win As MSHTML.HTMLWindow2
    Set win = doc.defaultView

    ' Should be able to directly access win.digitalData, but I can't get that to work.
    ' Instead, access the data indirectly

    Dim uniqueid As String
    uniqueid = "id_" & Format(Now, "%yyyy%mm%dd%hh%nn%ss")
    Dim code As String
    code = "(function(){var x = document.createElement('p'); x.id='" & uniqueid & "'; x.innerText=digitalData.page.pageName; document.body.appendChild(x); })()"
        ' Copy digitalData.page.pageName into the DOM
    win.execScript code, "JavaScript"

    Dim pageName As String
    Dim node
    Set node = doc.getElementById(uniqueid)     ' Get the new DOM node
    pageName = node.innerText

    ' Clean up
    doc.getElementsByTagName("body").Item(0).RemoveChild node

    ' ^^^^^^^^^^^^^^        

    ' Now do whatever you want with pageName.
    Debug.Print pageName
End Sub

Explanation

The magic is in code and win.execScript. code is a JavaScript oneliner that creates a new <p> element and copies digitalData.page.pageName into it. win.execScript runs that JavaScript in the context of the page, thereby creating the new node. The node has a uniqueid (well, probably unique) that we can use to find it once it's been created. Then we can pull the result out of the innerText of that paragraph.

Thanks to

cxw
  • 16,685
  • 2
  • 45
  • 81