1

In the following code we are using IE Automation to get from here

Location 1

"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & Ticker & "&type=10-Q&dateb=&owner=exclude&count=20"

to a location like this

Location 2

https://www.sec.gov/Archives/edgar/data/10795/000119312514042815/bdx-20131231.xml

Is there a way to go from location 1 to location 2 without using IE Automation and finding something more reliable, secure and faster?

For reasons of completeness here is the full code we have by now; by running you will see heavy use of IE:

Option Explicit

Sub MadMule2()
    Dim IE As InternetExplorer
    Dim el
    Dim els
    Dim colDocLinks As New Collection
    Dim Ticker As String
    Dim lnk
    Dim intCounter as Integer    

    Set IE = New InternetExplorer

    IE.Visible = False

    Ticker = Worksheets("Sheet1").Range("A1").Value

    LoadPage IE, "https://www.sec.gov/cgi-bin/browse-edgar?" & _
                  "action=getcompany&CIK=" & Ticker & "&type=10-Q" & _
                  "&dateb=&owner=exclude&count=20"

    Set els = IE.document.getElementsByTagName("a")
    For Each el In els
        If Trim(el.innerText) = "Documents" Then
            colDocLinks.Add el.href
        End If
    Next el

    intCounter = 1

     For Each lnk In colDocLinks
        LoadPage IE, CStr(lnk)
        For Each el In IE.document.getElementsByTagName("a")
            If el.href Like "*[0-9].xml" Then
                ActiveWorkbook.XmlMaps.Add(el, "xbrl").Name = "xbrl Map"
            End If
        Next el
    Next lnk
End Sub

Sub LoadPage(IE As InternetExplorer, URL As String)
    IE.navigate URL
    Do While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE
        DoEvents
    Loop
End Sub

ADDITIONAL

Q: Is there a way to go from location 1 to location 2 without using IE Automation and finding something more reliable, secure and faster?

Can you expand on this?

By mehow


A: Here is a comment for the code block we have received by user2140261 here:

You should look into MSXML it is much faster, secure, and reliable then IE automation.

Since the code opens Internet Explorer, parses the source page to find the href and get to the Web location needed; we wondered if there is a way to go into location 2 without the use of IE. Can it be done with MSXML as user2140261 states?

QHarr
  • 83,427
  • 12
  • 54
  • 101
Codo
  • 271
  • 3
  • 10
  • 24
  • *`Is there a way to go from location 1 to location 2 without using IE Automation and finding something more reliable, secure and faster?`* Can you please expand on this? Do you have anything specific in mind? What do you mean by *`going from location 1 to location 2`*? –  Feb 27 '14 at 11:58
  • I have modified my post. Please inform me if explanation is sufficient – Codo Feb 27 '14 at 12:05
  • Use Ajax: http://stackoverflow.com/questions/8044423/json-import-to-excel and my answer here: http://stackoverflow.com/questions/18835640/updating-oracle-table-from-excel-vba-macro-using-odbc-connection/18892448#18892448 – html_programmer Feb 27 '14 at 13:37
  • I just saw. I will look into detail – Codo Feb 27 '14 at 13:47
  • Dear @KimGysen i have to be honest with you. I barely can make a class module in VBA and i don't know Java. What do i have to read especially from Java? How will it take me? And most importantly of all can it go immediately to the URL in location 2 that i have in the question without parsing the IE document? – Codo Feb 27 '14 at 14:29
  • I felt totally overwhelmed in that post – Codo Feb 27 '14 at 14:30
  • Will AJAX send a request at the SEC EDGAR database by passing all URL related procedures? – Codo Feb 27 '14 at 14:32
  • 1
    The url is a location on the server where the xml file is stored. The Ajax call is an HTTP request to the url that runs in the background and that connects to the server without passing via the client (crawling the DOM), like you do in your example. In the examples of the links that I posted, JSON objects are returned, which is why json parsers are used as example. If you're working with XML, you may need to look for an XML parser in VBA. – html_programmer Feb 27 '14 at 14:38
  • possible duplicate of [VBA Apache Server Interaction](http://stackoverflow.com/questions/22145132/vba-apache-server-interaction) –  Mar 03 '14 at 11:28

1 Answers1

1

Here is an example using XHR

Option Explicit
Public Sub GetLinks()
    Dim ticker As String, html As New HTMLDocument, links As Object, i As Long
    ticker = [A1]                                'example is 81251
    Set html = GetHTMLDocument("https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & ticker)
    If html Is Nothing Then Exit Sub
    Set links = html.querySelectorAll("#documentsbutton")
    If links Is Nothing Then Exit Sub
    For i = 0 To links.Length - 1
        Debug.Print GetAbsoluteURL(links(i).getAttribute("href"))
    Next i
End Sub

Public Function GetAbsoluteURL(ByVal relativeURL As String) As String
    If InStr(relativeURL, "about:/") > 0 Then
        GetAbsoluteURL = Replace$(relativeURL, "about:/", "https://www.sec.gov/")
    Else
        GetAbsoluteURL = relativeURL
    End If
End Function

Public Function GetHTMLDocument(ByVal URL As String) As HTMLDocument
    Dim sResponse As String
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .send
        If .Status <> 200 Then
            Exit Function
        Else
            sResponse = StrConv(.responseBody, vbUnicode)
        End If
    End With
    Set GetHTMLDocument = New HTMLDocument
    GetHTMLDocument.body.innerHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function

Sample ticker is 81251


Sample output:

Sample output


CSS selectors and .querySelectorAll

  You can target the documents buttons with a CSS selector of "#documentsbutton". Now, whilst # means id, and id is usually unique, for this particular website it seems that the same id is used to identify all elements within a table.

The CSS selector query returns the following (sample) from the page:

CSS query


Applying the CSS selector in VBA:

  As more than one element is matched, the .querySelectorAll method of document is used to return a nodeList of items matched by the css selector:

html.querySelectorAll("#documentsbutton")

  We can traverse the nodeList, along its .Length, and access individual document button elements by index (position along nodeList; starting at 0):

For i = 0 To links.Length - 1

  As you want the hyperlinked info you can simply pull the href attribute from each matched element:

links(i).getAttribute("href")

  This returns a relative path so we use a tiny helper function, GetAbsoluteURL, to get the absolute path.


Closing notes:

  Whilst not completely adhering to the Single Responsibility Principle this does show some of the benefits of re-factoring your code. You can then re-use GetHTMLDocument function to handle the new URLs returned by GetAbsoluteURL

QHarr
  • 83,427
  • 12
  • 54
  • 101