2

I am trying to use MSXML2 and IHTMLDocument to deal with iframe part of HTML web page.

I want to use MSXML2 and save it to better "capture" the data, thinking it's faster than just using InternetExplorer or VBA selenium reference supported by VBA menu. (I don't want to avoid using IE or selenium as much as possible)

But I couldn't find out how to save document as XML format(to take advantage of its speed) and at the same time click on the element in the document without the help of browser(ie or selenium). And even after clicking some tab(id="cns_Tab21") on this web page, I have difficulty retrieving data.
So my question is.. 1> Is it possible to minimize the use of browser for clicking?

2> Even after clicking(using Selenium), it throws an xpath related error in VBA editor.

Thank you for your answer in advance and the URL used for this is http://bitly.kr/finance and the iframe inside the link is http://bitly.kr/LT0aCb

    'I declared objects
    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim iframeDoc As IHTMLDocument

    'and saved XML data to HTML format
     HTMLDoc.body.innerHTML = XMLReq.responseText

    'and trying to save this HTML to iframe...
    Set iframeDoc = HTMLDoc.getElementById("coinfo_cp")
    'I tried .contentDocument but it maybe HTMLdoc doesn't have this property. 

     and I don't know how to access information I saved to iframeDoc above. 



      'And after I use Selenium I can't figure out why it throw an error
       For Each ele In selenium.FindElementsByTag("th")
        If ele.Attribute("innerText") = "CAPEX" Then
        Debug.Print ele.FindElementsByXPath("./../td").Attribute("innerText")

This post isn't a duplicate since I am trying to use XML to handle iframe element and without InternetExplorer reference in VBA Excel.(ie.document)

BangolPhoenix
  • 383
  • 1
  • 4
  • 16

1 Answers1

3

You can make replicate the xhr request the page makes when that tab (not iframe) is selected. I use clipboard to copy table to Excel. Note: url I am using is from our discussions. This info should be reflected in question.

Option Explicit
Public Sub GetTable()
'VBE > Tools > References > Microsoft HTML Object Library
    Dim html As HTMLDocument, hTable As HTMLTable, clipboard As Object
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://navercomp.wisereport.co.kr/v2/company/ajax/cF1001.aspx?cmp_cd=005930&fin_typ=0&freq_typ=Y&encparam=ZXR1cWFjeGJnS1lWOHhCYmNScmJXUT09&id=bG05RlB6cn", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        html.body.innerHTML = .responseText
    End With

    Set hTable = html.querySelector(".hbG05RlB6cn + .gHead01")
    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") ' New DataObject
    clipboard.SetText hTable.outerHTML
    clipboard.PutInClipboard
    ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
End Sub

You can find the params of the ajax url for the tab content update in the scripts of the page

enter image description here

Along with the target for the update:

enter image description here


This needs tidying up:

Option Explicit
Public Sub GetTable()
'https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930
'VBE > Tools > References > Microsoft HTML Object Library
    Dim html As HTMLDocument, hTable As HTMLTable, clipboard As Object, ws As Worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://navercomp.wisereport.co.kr/v2/company/ajax/cF1001.aspx?cmp_cd=005930&fin_typ=0&freq_typ=Y&encparam=ZXR1cWFjeGJnS1lWOHhCYmNScmJXUT09&id=bG05RlB6cn", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        html.body.innerHTML = .responseText

    End With

    Set hTable = html.querySelector(".hbG05RlB6cn + .gHead01") '2nd tab. CAPEX row

    Dim html2 As HTMLDocument, i As Long

    Set html2 = New HTMLDocument
    html2.body.innerHTML = hTable.outerHTML

    Dim tableBodyRows As Object, tableBodyRowLength As Long, tableHeaderRowLength As Long, tableHeaderRows As Object, targetRow As Long

    Set tableBodyRows = html2.querySelectorAll("tbody tr .bg")
    tableBodyRowLength = tableBodyRows.Length
    tableHeaderRowLength = html2.querySelectorAll("thead tr").Length + 2

    For i = 0 To tableBodyRowLength - 1
        If Trim$(tableBodyRows.item(i).innerText) = "CAPEX" Then
            targetRow = i + tableHeaderRowLength + 1
            Exit For
        End If
    Next

    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") ' New DataObject
    clipboard.SetText hTable.outerHTML
    clipboard.PutInClipboard
    ws.Cells(1, 1).PasteSpecial

    Dim unionRng As Range

    For i = (tableHeaderRowLength + 1) To (tableBodyRowLength + tableHeaderRowLength)
        If i <> targetRow Then
            If Not unionRng Is Nothing Then
                Set unionRng = Union(ws.rows(i), unionRng)
            Else
                Set unionRng = ws.rows(i)
            End If
        End If
    Next
    If Not unionRng Is Nothing Then unionRng.Delete
End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Thanks @QHarr I really appreciate your help! but I think I omitted the essence of my problem: I didn't want to use ie object since it's too slow and cumbersome but my task revolves around clicking an element to retrieve the dynamically updated data. I need to click the element with id='cns_tab21' ; Will it be possible to do this with XML and without ie? Or should I use selenium? Thank you so much! – BangolPhoenix Apr 16 '19 at 07:20
  • 1
    Hi, I do not see that id in the html for that link Where is it? Please update your question to exactly what you want. – QHarr Apr 16 '19 at 07:23
  • thank you for your patience @QHarr, I tried to fix the problem by myself but still can't figure out how to proceed. So I edited my question accordingly but it got too big since my code took different turns. I hope what I found later still fits the purpose of this question post :) thank you for your help! – BangolPhoenix Apr 18 '19 at 11:55
  • id='cns_tab21' doesn't exist for https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930" What values are you after? Can you give the url and a couple of the expected values? – QHarr Apr 18 '19 at 18:52
  • I'm sorry for the typo. It's id = 'cns_Tab21' – BangolPhoenix Apr 20 '19 at 06:17
  • That is no an iframe. It is a tab. – QHarr Apr 20 '19 at 06:30
  • 1
    I have updated the answer to handle the tab situation. Please try it. – QHarr Apr 20 '19 at 06:41
  • I submitted the incomplete comment and made a new comment ; I'm sorry for the typo and confusion @QHarr. It's id = 'cns_Tab21', and the letter T is capitalized. I want the values of all "CAPEX" row. So when you click the tag with the attribute id="cns_Tab21", the table renders and shows another table for the annual financial data. the tag structure for this newly updated one is like this: CAPEX Thank you!
    – BangolPhoenix Apr 20 '19 at 06:45
  • yeah it's not iframe anymore since I found it maybe better to go to the website URL of what iframe refers to; so i think my post now is irrelevant and deviated from the purpose of my first theme and that's why I took some time for editing ;anyway I will keep the original post and go on to other question in different posts ; thanks @QHarr for your updated click event advice I will try that! – BangolPhoenix Apr 20 '19 at 06:47
  • 1
    Has the above answered the question?> – QHarr Apr 20 '19 at 06:57
  • Could you tell me how you got the specific URL based on my URL? and queryselector and getobject argument values? thank you – BangolPhoenix Apr 20 '19 at 07:01
  • 1
    I think my answer needs refining. I will continue to look at it today. – QHarr Apr 20 '19 at 07:04
  • 1
    The url I got from the dev tools network tab in Chrome when clicking on the tab in question. The xhr appears in the network list. The querySelector css selector I wrote myself by looking at the html, https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors, the getObject that is the reference for clipboard object - you can google it – QHarr Apr 20 '19 at 07:06
  • thank you for your explanation. Now I have got a lot to study since you opened a new door for this:)) ; BTW is there a way to capture this seemingly dynamically rendered URLs? Or is it something that only website sever knows? I just checked a different URL of another company info and I was wondering if I could use it for different company info tables – BangolPhoenix Apr 20 '19 at 07:27
  • Not that I know of in VBA. There may be a way in VBA to find log info but it might only be at URL level. Python has some facility for recording/retrieving network activity for sure but again not sure how detailed that is beyond URI. The url appears to be static for the tabs btw. – QHarr Apr 20 '19 at 07:33
  • The construction of this url and the parameters are in one of the script tags of the page. I will add an edit. – QHarr Apr 20 '19 at 07:38
  • Means you could extract that info on the fly, if needed, to construct the url. – QHarr Apr 20 '19 at 07:43