10

To the point:

I have successfully used VBA to do the following:

  • Login to a website using getElementsByName

  • Select parameters for the report that will be generated (using getelementsby...)

  • generating the report after selecting parameters which renders the resulting dataset into an iframe on the same page

Important to note - The website is client-side

The above was the simple part, the difficult part is as below:

clicking on a gif image within the iframe that exports the dataset to a csv

I have tried the following:

Dim idoc As HTMLDocument
Dim iframe As HTMLFrameElement
Dim iframe2 As HTMLDocument

Set idoc = objIE.document
Set iframe = idoc.all("iframename")
Set iframe2 = iframe.contentDocument

    Do Until InStr(1, objIE.document.all("iframename").contentDocument.innerHTML, "img.gif", vbTextCompare) = 0
        DoEvents
    Loop

To give some context to the logic above -

  • I accessed the main frame
  • i accessed the iframe by its name element
  • i accessed the content within the iframe
  • I attempted to find the gif image that needs to be clicked to export to csv

It is at this line that it trips up saying "Object doesn't support this property or method"

Also tried accessing the iframe gif by the a element and href attribute but this totally failed. I also tried grabbing the image from its source URL but all this does it take me to the page the image is from.

note: the iframe does not have an ID and strangely the gif image does not have an "onclick" element/event

Final consideration - attempted scraping the iframe using R

accessing the HTML node of the iframe was simple, however trying to access the attributes of the iframe and subsequently the nodes of the table proved unsuccessful. All it returned was "Character(0)"

library(rvest)
library(magrittr)

Blah <-read_html("web address redacted") %>%
  html_nodes("#iframe")%>%
  html_nodes("#img")%>%
  html_attr("#src")%>%
  #read_html()%>%
  head()
Blah

As soon as a i include read_html the following error returns on the script:

Error in if (grepl("<|>", x)) { : argument is of length zero

I suspect this is referring to the Character(0)

Appreciate any guidance here!

Many Thanks,

HTML

<div align="center"> 
    <table id="table1" style="border-collapse: collapse" width="700" cellspacing="0" cellpadding="0" border="0"> 
        <tbody>
            <tr>
                <td colspan="6"> &nbsp;</td>
            </tr> 
            <tr> 
                <td colspan="6"> 
                    <a href="href redacted">
                        <img src="img.gif" width="38" height="38" border="0" align="right">
                    </a>
                    <strong>x - </strong>
                </td>
            </tr> 
        </tbody>
    </table>
</div>
QHarr
  • 83,427
  • 12
  • 54
  • 101
mojo3340
  • 534
  • 1
  • 6
  • 27

3 Answers3

9

It is sometimes tricky with iframes. Based on html you provided I have created this example. Which works locally, but would it work for you as well?

To get to the IFrame the frames collection can be used. Hope you know the name of the IFrame?

Dim iframeDoc As MSHTML.HTMLDocument
Set iframeDoc = doc.frames("iframename").document

Then to go the the image we can use querySelector method e.g. like this:

Dim img As MSHTML.HTMLImg
Set img = iframeDoc.querySelector("div table[id='table1'] tbody tr td a[href^='https://stackoverflow.com'] img")

The selector a[href^='https://stackoverflow.com'] selects anchor which has an href attribute which starts with given text. The ^ denotes the beginning.

Then when we have the image just a simple call to click on its parent which is the desired anchor. HTH


Complete example:

Option Explicit

' Add reference to Microsoft Internet Controls (SHDocVw)
' Add reference to Microsoft HTML Object Library

Sub Demo()

    Dim ie As SHDocVw.InternetExplorer
    Dim doc As MSHTML.HTMLDocument
    Dim url As String
    
    url = "file:///C:/Users/dusek/Documents/My Web Sites/mainpage.html"
    Set ie = New SHDocVw.InternetExplorer
    ie.Visible = True
    ie.navigate url

    While ie.Busy Or ie.readyState <> READYSTATE_COMPLETE
        DoEvents
    Wend
    
    Set doc = ie.document
    
    Dim iframeDoc As MSHTML.HTMLDocument
    Set iframeDoc = doc.frames("iframename").document
    If iframeDoc Is Nothing Then
        MsgBox "IFrame with name 'iframename' was not found."
        ie.Quit
        Exit Sub
    End If
    
    Dim img As MSHTML.HTMLImg
    Set img = iframeDoc.querySelector("div table[id='table1'] tbody tr td a[href^='https://stackoverflow.com'] img")
    If img Is Nothing Then
        MsgBox "Image element within iframe was not found."
        ie.Quit
        Exit Sub
    Else
        img.parentElement.Click
    End If
    
    ie.Quit
End Sub

Main page HTML used

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<!-- saved from url=(0016)http://localhost -->
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<title>x -</title>
</head>

<body>
<iframe name="iframename" src="iframe1.html">
</iframe>
</body>

</html>

IFrame HTML used (saved as file iframe1.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<!-- saved from url=(0016)http://localhost -->
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<title>Untitled 2</title>
</head>

<body>
<div align="center"> 
    <table id="table1" style="border-collapse: collapse" width="700" cellspacing="0" cellpadding="0" border="0"> 
        <tbody>
            <tr>
                <td colspan="6"> &nbsp;</td>
            </tr> 
            <tr> 
                <td colspan="6"> 
                    <a href="https://stackoverflow.com/questions/44902558/accessing-object-in-iframe-using-vba">
                        <img src="img.gif" width="38" height="38" border="0" align="right">
                    </a>
                    <strong>x - </strong>
                </td>
            </tr> 
        </tbody>
    </table>
</div>

</body>

</html>

BTW, The frame may be referenced by it's index also doc.frames(0).document. Thanks to Paulo Bueno.

Daniel Dušek
  • 13,683
  • 5
  • 36
  • 51
  • Okay so successfully managed to get it to click on the gif! thanks! however... the gif exports a csv file that contains the data within the iframe. The export is empty... nothing in the csv but there is in the iframe i can see – mojo3340 Jul 06 '17 at 16:10
  • So the difficult part is not done yet :). After `click` on the `anchor` the data which are loaded inside of that `iframe` shoud be exported to `csv-file`. And you see the data in the `iframe` but after `click` on `anchor` the `csv-file` is empty. Do I understood it correctly? – Daniel Dušek Jul 06 '17 at 18:47
  • Yes that's correct. I get a web browser prompt to open or save as the file. I open it manually but file is empty – mojo3340 Jul 06 '17 at 18:55
  • You have to examine the request e.g. via [IE Developer Tools - F12](https://www.youtube.com/watch?v=GbbjL_Uir24). Compare request in the case when it works with case when you click the anchor programmatically. Have a look on `query string` etc and try to find the differences. But without seeing it directly it is hard to say. – Daniel Dušek Jul 06 '17 at 19:06
  • what exactly do you need to see to solve this? there are lots of areas in the DOM explorer/F12 Tools – mojo3340 Jul 07 '17 at 08:32
  • You should examine the `Request` the browser sends to server, when the `CSV-file` is requested. It is on the `Network-Tab`. So you can maybe found why it works when clicked _manually_ and why it not working when clicked _programatically_. – Daniel Dušek Jul 07 '17 at 08:45
  • Okay i inspected the difference between manual and programmatic. Very informative way of debugging! When manually clicked, the "Name/Path" is the FULL href link whereas when programmatically clicked the "Name/Path" is only the anchor element of the href (not the full path). Second difference is "content type" - manual is application/x-csv and programmatic is text/html – mojo3340 Jul 07 '17 at 08:56
  • when i say "anchor element" i mean similar to the example above where you have "selector a[href^='https://stackoverflow.com']". – mojo3340 Jul 07 '17 at 09:01
  • Could you post these informations here e.g. like screen shots? In some censored form so the clients informations stay hidden? I don't know what is the cause. Maybe some javascript runs when clicked manually because it reacts on e.g. `mouse-up` event (which is not present when programatically calling `click` only)? – Daniel Dušek Jul 07 '17 at 10:01
  • do you want screenshot of what is displayed on the "Network" tab? for when it is programmatically click vs manually clicked? – mojo3340 Jul 07 '17 at 15:46
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/148627/discussion-between-dee-and-mo-h). – Daniel Dušek Jul 07 '17 at 17:52
  • So we moved the discussion to chat, but now we have to write comments again to remind, that in chat is new post, well thats _funny_ :). Or do I miss something? – Daniel Dušek Jul 10 '17 at 15:12
  • BTW, The frame may be referenced by it's index also `doc.frames(0).document`. Cheers – Paulo Bueno Jun 27 '20 at 17:48
2

I thought I would expand on the answer already given.

In the case of Internet Explorer you may have one of two common situations to handle regarding iframes.

  1. src of iframe is subject to same origin policy restrictions:

The iframe src has a different origin to the landing page in which case, due to same origin policy, attempts to access it will yield access denied.

Resolution:

Consider using selenium basic to automate a different browser such as Chrome where CORS is allowed/you can switch to the iframe and continue working with the iframe document

Example:

Option Explicit
'download selenium https://github.com/florentbr/SeleniumBasic/releases/tag/v2.0.9.0
'Ensure latest applicable driver e.g. ChromeDriver.exe in Selenium folder
'VBE > Tools > References > Add reference to selenium type library
Public Sub Example()
    Dim d As WebDriver
    Const URL As String = "https://www.rosterresource.com/mlb-roster-grid/"
    Set d = New ChromeDriver
    With d
        .Start "Chrome"
        .get URL
        .SwitchToFrame .FindElementByCss("iframe") '< pass the iframe element as the identifier argument
        ' .SwitchToDefaultContent ''to go back to parent document.
        Stop '<== delete me later
        .Quit
    End With
End Sub

  1. src of iframe is not subject to same origin policy restrictions:

Resolution:

The methods as detailed in answer already given. Additionally, you can extract the src of the iframe and .Navigate2 that to access

.Navigate2 .document.querySelector("iframe").src

If you only want to work with the contents of the iframe then simply do your initial .Navigate2 the iframe src and don't even visit the initial landing page

Example:

Option Explicit
Public Sub NavigateUsingSrcOfIframe()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .Navigate2 "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5978065"

        While .Busy Or .readyState < 4: DoEvents: Wend
        
        .Navigate2 .document.querySelector("iframe").src
        
        While .Busy Or .readyState < 4: DoEvents: Wend

        Stop '<== delete me later
        .Quit
    End With
End Sub

  1. iframe in ShadowRoot

An unlikely case might be an iframe in shadowroot. You should really have one or the other and not one within the other.

Resolution:

In that case you need an additional accessor of

Element.shadowRoot.querySelector("iframe").contentDocument

where Element is your parent element with shadowRoot attached. This method will only work if the shadowRoot mode is set to Open.

Side note:

A nice selenium based example, using ExecuteScript to return shadowRoot is given here: How Do I Access Elements in the Shadow DOM using Selenium in VBA?

QHarr
  • 83,427
  • 12
  • 54
  • 101
0

Adding to the answers given:

If you're ok with using a DLL and rewrite your code, you can run Microsoft's Edge browser (a Chrome-based browser) with VBA. With that you can do almost anything you want. Note however, that access to the DOM is performed by javascript, not by an object like Dim IE As New InternetExplorer. Look at the VBA sample and you'll get the grasp.

https://github.com/peakpeak-github/libEdge

Sidenote: Samples for C# and C++ are also included.

StureS
  • 227
  • 2
  • 10