0

I hate that I have to ask this question again but the website I had been scraping data from updated, not just aesthetically, the underlying code has changed too. Before the update, the program would find the "Key Data" table and use a counter to find specific data. The problem is I'm not getting into the values anymore and when I try to use a Class Name closer to the value, it doesn't find it at all and drops out of the program.

I've cut out some of the code below to share, would appreciate any help. Similar to last time, I feel so close but am coming up short.

Set ieObj = New InternetExplorer
ieObj.Visible = True
ieObj.navigate "https://web.tmxmoney.com/quote.php?qm_symbol=" & Cells(c, 2) & ":US"

Do While ieObj.readyState <> 4
    Application.StatusBar = "Getting to 'Key Data' Table"
    DoEvents
Loop

d = 0
For Each htmlELE In ieObj.document.getElementsByClassName("sc-kfYqjs jpcbJh")
    
    'scrape EPS amount
    If d = 9 Then
        EPS = htmlELE.innerText
        Range("H2").value = EPS
    End If

    'scrape dividend
    If d = 14 Then
        div = htmlELE.innerText
        Range("I2").value = div
    End If

d = d + 1
Next


enter image description here

SIM
  • 21,997
  • 5
  • 37
  • 109
Matt
  • 15
  • 6

2 Answers2

1

It's a dynamic page. The content will be load by scrolling down. So you must wait for load the "first part" of the page. Then scroll down to the needed table by 1500. After scrolling down wait to load the table.

You don't need a loop to scrape the wanted values. You can use querySelector() to get the specific element by it's attribute name and atttribute value.

Look here for information about querySelector()
And here for examples you can use also with querySelector(): querySelectorAll()

This code works for me. If not for you play with the breaks and perhaps the amount of scrolling:

Sub test()

Dim ieObj As Object
Dim nodeEps As Object
Dim nodeDividend As Object
Dim eps As String
Dim dividend As String

  Set ieObj = CreateObject("InternetExplorer.Application")
  ieObj.Visible = True
  'ieObj.navigate "https://web.tmxmoney.com/quote.php?qm_symbol=" & Cells(c, 2) & ":US"
  ieObj.navigate "https://web.tmxmoney.com/quote.php?qm_symbol=GE:US"
  
  Do While ieObj.readyState <> 4
    Application.StatusBar = "Getting to 'Key Data' Table"
    DoEvents
  Loop
  Application.Wait (Now + TimeSerial(0, 0, 3))
  ieObj.document.parentWindow.Scroll 0, 1500
  Application.Wait (Now + TimeSerial(0, 0, 2))
  
  'scrape EPS amount
  Set nodeEps = ieObj.document.querySelector("div[data-testid='eps-value']")
  eps = Trim(nodeEps.innerText)
  'Range("H2").Value = eps
  
  'scrape dividend
  Set nodeDividend = ieObj.document.querySelector("div[data-testid='dividendAmount-value']")
  dividend = Trim(nodeDividend.innerText)
  'Range("I2").Value = dividend
  
  'Clean up
  ieObj.Quit
  Set ieObj = Nothing
  Set nodeEps = Nothing
  Set nodeDividend = Nothing
  Application.StatusBar = False
  
  MsgBox eps & Chr(13) & dividend
End Sub
Zwenn
  • 2,147
  • 2
  • 8
  • 14
  • thanks for your time. I'm getting a "Run-time error '424': Object Required" notification on the eps scrape, the line "Set nodeEps = ie.obj.document..." which I think is a bit odd with the object defined above. I have the Microsoft HTML Object Library checked but is there something else I'm missing? – Matt Sep 23 '20 at 17:46
  • Success! You were right about adjusting the break time and scroll. Thanks for opening my eyes to "queryselector"- life changer. – Matt Sep 23 '20 at 23:44
  • @Matt I am pleased that you are now getting the data you need again. In my code no library references in Excel are needed. I work with late binding, because the code runs on every computer without further preparation. The references are needed for early binding. Infos: https://riptutorial.com/excel-vba/example/10855/early-binding-vs-late-binding Did you have a look at the solution of SIM? It has the advantage that you can save the whole handling of IE and the answer comes much faster. But you have to spend more time on it to understand it if you have never worked with xhr and JSON before. – Zwenn Sep 24 '20 at 09:25
1

You can get the value of eps using xmlhttp requests if you send a post requests to the correct url along with appropriate parameters. When you run the script, you should get json response containing the required result. I used regex to parse the specific portion you are interested in.

The payload is bigger than usual. However, you can change the ticker name within the variable accordingly.

This is how you can go:

Sub GetContent()
    Const link = "https://app-money.tmx.com/graphql"
    Dim elem As Object, payload As Variant, S$, tickerName$
    
    tickerName = "AAPL:US"      'use ticker name here

    payload = "{""operationName"":""getQuoteBySymbol"",""variables"":{""symbol"":""" & tickerName & """,""locale"":""en""},""query"":""query getQuoteBySymbol($symbol: String, $locale: String) {\n  getQuoteBySymbol(symbol: $symbol, locale: $locale) {\n    symbol\n    name\n    price\n    priceChange\n    percentChange\n    exchangeName\n    exShortName\n    exchangeCode\n    marketPlace\n    sector\n    industry\n    volume\n    openPrice\n    dayHigh\n    dayLow\n    MarketCap\n" & _
            "MarketCapAllClasses\n    peRatio\n    prevClose\n    dividendFrequency\n    dividendYield\n    dividendAmount\n    dividendCurrency\n    beta\n    eps\n    exDividendDate\n    shortDescription\n    longDescription\n    website\n    email\n    phoneNumber\n    fullAddress\n    employees\n    shareOutStanding\n    totalDebtToEquity\n    totalSharesOutStanding\n    sharesESCROW\n    vwap\n    dividendPayDate\n    weeks52high\n    weeks52low\n    alpha\n    averageVolume10D\n    averageVolume30D\n    averageVolume50D\n    priceToBook\n    priceToCashFlow\n    returnOnEquity\n" & _
            "returnOnAssets\n    day21MovingAvg\n    day50MovingAvg\n    day200MovingAvg\n    dividend3Years\n    dividend5Years\n    datatype\n    __typename\n  }\n}\n""}"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "POST", link, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
        .setRequestHeader "Content-Type", "application/json"
        .send payload
        S = .responseText
    End With
    
    With CreateObject("VBScript.RegExp")
        .Global = True
        .MultiLine = True

        .Pattern = """eps"":(.*?),"
        Set elem = .Execute(S)
        MsgBox elem(0).SubMatches(0)
    End With
End Sub
SIM
  • 21,997
  • 5
  • 37
  • 109
  • This is one of the most awesome answers I read on SO till now. For me, this is a new approach that has great advantages over using IE. Thank you for the insight into the solution approach via xhr and JSON. I understand the answer from the server, but I don't know where do you know from, how to build the JSON for the payload? Can you give me a tip or a research approach? I will also look for your responses on SO on this issue so far. – Zwenn Sep 24 '20 at 09:30
  • Okay, I'll paste here some links that will help understand the thorough process. Thanks. – SIM Sep 24 '20 at 10:16
  • Okay, I found the JSONs in the network analysis. Probably it is necessary each time to check which one is the right one for the payload. Links are very welcome. Thank you. – Zwenn Sep 24 '20 at 10:19
  • 1
    For inspecting [network activity](https://developers.google.com/web/tools/chrome-devtools/network/reference) and for creating [json params](https://stackoverflow.com/questions/21021540/post-json-to-web-in-excel-vba). – SIM Sep 24 '20 at 11:33
  • Sorry, with my limited knowledge about VBA, this example went over my head but I'll have to try it when I have some extra time. Thanks for providing. – Matt Sep 25 '20 at 04:30