1

In my previous question (everything needed is on this question; the link is here for the sake of completeness and measure) i ask for a way to pull XML data to Excel from a Web location. The code i received (courtesy of user2140261) as an answer lies here:

Sub GetNode()
Dim strXMLSite As String
Dim objXMLHTTP As MSXML2.XMLHTTP
Dim objXMLDoc As MSXML2.DOMDocument
Dim objXMLNodexbrl As MSXML2.IXMLDOMNode
Dim objXMLNodeDIIRSP As MSXML2.IXMLDOMNode

Set objXMLHTTP = New MSXML2.XMLHTTP
Set objXMLDoc = New MSXML2.DOMDocument

strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

objXMLHTTP.Open "POST", strXMLSite, False
objXMLHTTP.send
objXMLDoc.LoadXML (objXMLHTTP.responseText)

Set objXMLNodexbrl = objXMLDoc.SelectSingleNode("xbrl")

Set objXMLNodeDIIRSP = objXMLNodexbrl.SelectSingleNode("us-gaap:DebtInstrumentInterestRateStatedPercentage")

Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub  

But every company has a different XML Instance Document, and every time period a company publishes a different XML Instance Document (e.g. quarterly, annually). So these documents can be accessed in different web locations.

Now in the previous procedure we can see we only need to use the statement

strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

...but this is when we know beforehand that we want data from one specified location in the Web

What if we want to pull some data for these 4 different locations depicted by an asterisk(*) in the image below

Dynamically changing the data pull target

How could we actually input our "coordinates" in Excel let's say in one of our userforms/cells for example and then make VBA "navigate/crawl" there just by using these coordinates just as we are navigating there with a browser?

The coordinates that we input can be:

  • A Stock Ticker (e.g. TSLA for Tesla Motors)
  • A type of files for example 10-Q's

You can pick the type of files in these links for BDX and ANN respectively:

BDX LINK

ANN LINK

Below we have 2 web locations for the Instance Document locations of BDX company and 2 for ANN company

How could we pull from an XML element that is existent in all the four instance documents for example us-gaap:CommonStockValue by simply giving VBA the

  1. Stock Ticker
  2. The document type (10-K, 10-Q)

Can it be done with the use of Microsoft XML Core Services (MSXML) or we require some other Library too?

You can see how impractical it is to fire this code thousand of times and every time copy the URL from the Web Browser to the strXMLSite as a String value....

Community
  • 1
  • 1
ExoticBirdsMerchant
  • 1,466
  • 8
  • 28
  • 53

1 Answers1

1

[edit1]

In response to the comment:

the only thing that remains for the us is to understand how URL's actually change so they can be predictable and manipulated by sting concantenation? In what code language is the URL written?

The short answer is open a browser and right-click on a blank spot in the webpage you're interested in and select View Source from the popup menu.

To repeat the example provided in the other post VBA href Crawl on Browser's Source Code , do this:

Open Edgar Online Company Search in a browser: https://www.sec.gov/edgar/searchedgar/companysearch.html

Use the Fast Search function to search for ticker CRR and it gives me this URL: https://www.sec.gov/cgi-bin/browse-edgar?CIK=CRR&Find=Search&owner=exclude&action=getcompany which contains the list of public filings for Carbo Ceramics, Inc.

Now, right click on the page to get the source and scroll down to line 91. You'll see this block of code:

      <table class="tableFile2" summary="Results">

That's the beginning of the results table that shows the list of public filings.

         <tr>
            <th width="7%" scope="col">Filings</th>
            <th width="10%" scope="col">Format</th>
            <th scope="col">Description</th>
            <th width="10%" scope="col">Filing Date</th>
            <th width="15%" scope="col">File/Film Number</th>
         </tr>

That's the header row of the table with column descriptions.

<tr>
<td nowrap="nowrap">SC 13G</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/1009672/000108975514000003/0001089755-14-000003-index.htm" id="documentsbutton">&nbsp;Documents</a></td>
<td class="small" >Statement of acquisition of beneficial ownership by individuals<br />Acc-no: 0001089755-14-000003&nbsp;(34 Act)&nbsp; Size: 8 KB            </td>
            <td>2014-02-14</td>
            <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=005-48851&amp;owner=exclude&amp;count=40">005-48851</a><br>14615563         </td>
         </tr>

And that's the first row of actual data in the table for filing SC 13G, Statement of acquisition of beneficial ownership by individuals Acc-no: 0001089755-14-000003 (34 Act) Size: 8 KB, submitted on 2014-02-14.

So, now you want to loop through all of the document URLs on this page and that's why you're asking what language the URLs are in? (Crawl the page, in other words?)

[begin original answer]

How could we actually input our "coordinates" in Excel let's say in one of our userforms/cells for example and then make VBA "navigate/crawl" there just by using these coordinates just as we are navigating there with a browser?

I googled "get google results as xml" while researching another question. One interesting hit that came back was this link: http://nielsbosma.se/projects/seotools/functions/

I make no representation about the merits of this tool, but it seems to have the functionality you're asking for.

Now in the previous procedure we can see we only need to use the statement strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml" ...but this is when we know beforehand that we want data from one specified location in the Web

Yes, so once you've gotten some sort of web crawling function to return a list of xml document links, you first need to put them somewhere the user can see. My preference would be a range on a worksheet, but you could load up a list or combo box in a form as well. Regardless, then you would modify Sub GetNode() to accept an input parameter based on user selection:

Sub GetNode(strUrl as String)
...
strXMLSite = strUrl
...
Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub  

Or perhaps better make it a function which returns the xml as text for you to consume however you'd like:

Function GetNode(strUrl as String) as String
...
strXMLSite = strUrl
...
'return result
GetNode = objXMLNodeDIIRSP.Text
End Function  

Interesting question overall and I was happy to give you feedback on the code you posted. Your other questions can probably be answered by doing a bit of google searching.

Community
  • 1
  • 1
  • my friends i am currently at a house but i will be answering very cafefuly once i get the time because it is not possible from my cell – ExoticBirdsMerchant Feb 15 '14 at 00:58
  • Dear Brandon could you see at Codo who is trying to do the exact same thing? at his post in http://stackoverflow.com/questions/21787036/vba-network-web-crawling he explains on the issue far better than me. See we have the same problem – ExoticBirdsMerchant Feb 15 '14 at 18:11
  • I also saw the SeoTools well i see some markup tag names i think but can't get anything more...what is this HTML? Basically parameterising the String value is not the issue that much..if you see the link i gave you, you will see in far more detail our problem – ExoticBirdsMerchant Feb 15 '14 at 18:15
  • what do you believe should i search on google? Is this Web Crawling that i am trying to do since i am just trying to know how the URL is comprised so that i can manipulated by String Value or it has another terminologY? Can it be done solely on VBA? – ExoticBirdsMerchant Feb 15 '14 at 18:18
  • Yes, I did see the other post from @Codo, but before he expanded on it. Looks as if you're both working on the same project. I suggested searching Google because it's 1) something that has already been done by others and 2) many of those people have probably shared some of their code or 3) might have a complete and affordable package already in place for you to use. – Brandon R. Gates Feb 15 '14 at 20:19
  • Dear Brandon the only thing that remains for the us is to understand how URL's actually change so they can be predictable and manipulated by sting concantenation? In what code language is the URL written? – ExoticBirdsMerchant Feb 15 '14 at 20:43
  • We don't want a ready made package because there is none for what we want: a totally customizable financial model... the only thing we cannot get is how the URL's tick – ExoticBirdsMerchant Feb 15 '14 at 20:45
  • Now that it's clear to me that you want EDGAR data specifically, there, in fact, does exist at least one API for getting XML data and loading into Excel: http://www.xignite.com/market-data/seamlessly-integrate-market-data-into-microsoft-excel/ This is a paid service, but they offer a 7-day free trial. "API" implies that you can use it to write any sort of app you choose. – Brandon R. Gates Feb 16 '14 at 01:22
  • Dear Brandon R. Gates i am aware of xignite. They service does not interest me for specific reasons of data transparency. Well i do not want data from EDGAR specifically. EDGAR is just the beginning... http://quant.stackexchange.com/questions/141/what-data-sources-are-available-online that is why i need to devise functionality – ExoticBirdsMerchant Feb 16 '14 at 10:01
  • Now that answer will take a long way along the road @Brandon R. Gates – ExoticBirdsMerchant Feb 16 '14 at 10:10