1

I did update on my question since i know more clearly on the technicality i am trying to address.

A. If we take the resulting URL from a search on a data agency's site we get this

    https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000010795&type=10-K&dateb=&owner=exclude&count=20

B. By entering the URL of Step A into a Browser and going to the source code we see at line No. 100 (I use Google Chrome) this charming line which is also a clickable link:

    href="/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm"

the line is contained in this code snippet of our source code:

    <tr>
<td nowrap="nowrap">10-K</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm" id="documentsbutton">&nbsp;Documents</a>&nbsp; <a href="/cgi-bin/viewer?action=view&amp;cik=10795&amp;accession_number=0001193125-13-456802&amp;xbrl_type=v" id="interactiveDataBtn">&nbsp;Interactive Data</a></td>
<td class="small" >Annual report [Section 13 and 15(d), not S-K Item 405]<br />Acc-no: 0001193125-13-456802&nbsp;(34 Act)&nbsp; Size: 15 MB            </td>
            <td>2013-11-27</td>
            <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=001-04802&amp;owner=exclude&amp;count=20">001-04802</a><br>131247478         </td>
         </tr>

C. If we click on line 100 the link of step A, we go to the next page and the link of step A now becomes part of the URL! So what we get is a new page assigned to this URL:

https://www.sec.gov/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm

D. With use of the same methodology we meet in line No. 182 this line of code

href="/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

if we click the line we get to the strXMLSite which is on the macro below. Once you take a look at the macro and run it, you will understand that it is a logical conclusion that the String could populated with the desired URL on runtime, if we could integrate a relevant procedure into our macro. That is the nucleus of the question.


We have activated the needed for the macro Microsoft XML Core Services (MSXML) (Excel --> VBE --> Tools --> References --> Microsoft XML, v6.0) needed for the procedure.

How can we make VBA Crawl from the URL which is on Step A through the source-code to the URL that is now on strXMLSite String by adding statements to the procedure? Do we need to activate a library from Tools--> References? Can you show me a code block using such a methodology? What is the line of approach on this point?

For reasons of completeness allow me to provide the macro courtesy of @user2140261

Sub GetNode()
Dim strXMLSite As String
Dim objXMLHTTP As MSXML2.XMLHTTP
Dim objXMLDoc As MSXML2.DOMDocument
Dim objXMLNodexbrl As MSXML2.IXMLDOMNode
Dim objXMLNodeDIIRSP As MSXML2.IXMLDOMNode

Set objXMLHTTP = New MSXML2.XMLHTTP
Set objXMLDoc = New MSXML2.DOMDocument

strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

objXMLHTTP.Open "POST", strXMLSite, False
objXMLHTTP.send
objXMLDoc.LoadXML (objXMLHTTP.responseText)

Set objXMLNodexbrl = objXMLDoc.SelectSingleNode("xbrl")

Set objXMLNodeDIIRSP = objXMLNodexbrl.SelectSingleNode("us-gaap:DebtInstrumentInterestRateStatedPercentage")

Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub

thank you for watching my question

ExoticBirdsMerchant
  • 1,466
  • 8
  • 28
  • 53
Codo
  • 271
  • 3
  • 10
  • 24
  • 1
    I think this is an interesting question, but I don't exactly understand what you're trying to do. I think you're saying that you want to enter a search term into a cell in Excel and have a VBA routine return the search results from Google as a list of links somewhere else in the worksheet? – Brandon R. Gates Feb 14 '14 at 21:17
  • I am sorry for posting so superficially the issue @BrandonR.Gates i have re-edited the question however. Thank you – Codo Feb 15 '14 at 15:41
  • Dear @brettdj my latest efforts are depicted in the re-edited question up. Thank you – Codo Feb 15 '14 at 15:41
  • My answer was simply for simple original question, the actuall automation of the Internet, and web scraping is muhc more complex, you seem to be doing an almost identical thing as another user of this site Named [ExoticBirdsMerchant](http://stackoverflow.com/users/3166768/exoticbirdsmerchant) and you may want to look through his question and answers, as he is going through your EXACT situation trying to get data form the same website you are. You can start [Here at an Almost Identical Question](http://stackoverflow.com/questions/21786105/vba-pull-xml-data-from-multiple-web-locations) – user2140261 Feb 15 '14 at 15:52
  • How much more complex what must i read? – Codo Feb 15 '14 at 15:57
  • Codo, your revised question does better explain what you're trying to do. In my view, you are looking for help designing an application, and as @user2140261 points out, is better suited for engaging a paid expert. :) – Brandon R. Gates Feb 15 '14 at 20:30
  • Typically you would do this (eg) automating IE to navigate to the "StepA" URL, then (once the page has loaded) looping through the collection returned from (eg) `IE.document.getElementsByTagName("a")` or `document.links` and checking the `href` attribute on each link. However beyond that I'm unable to follow exactly what you're trying to do here. – Tim Williams Feb 18 '14 at 06:59
  • Dear @TimWilliams i am trying to find the link given on step B and use the link when the procedure is running to be directed **where this link leads**. It leads to a new page which has a new URL. Now i am trying to find href again and take it's content `"/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"` in a String variable inside my procedure – Codo Feb 18 '14 at 10:41

1 Answers1

5

Add a reference to "Microsoft Internet controls". This will get you to the point where you can get the individual xml links.

Sub Tester()

    Dim IE As New InternetExplorer
    Dim els, el, colDocLinks As New Collection
    Dim lnk

    IE.Visible = True
    Loadpage IE, "https://www.sec.gov/cgi-bin/browse-edgar?" & _
                  "action=getcompany&CIK=0000010795&type=10-K" & _
                  "&dateb=&owner=exclude&count=20"

    'collect all the "Document" links on the page
    Set els = IE.Document.getelementsbytagname("a")
    For Each el In els
        If Trim(el.innerText) = "Documents" Then
            'Debug.Print el.innerText, el.href
            colDocLinks.Add el.href
        End If
    Next el

    'loop through the "document" links and check each page for xml links
    For Each lnk In colDocLinks
        Loadpage IE, CStr(lnk)
        For Each el In IE.Document.getelementsbytagname("a")
            If el.href Like "*.xml" Then
                Debug.Print el.innerText, el.href
                'work with the document from this link
            End If
        Next el
    Next lnk

End Sub

Sub Loadpage(IE As Object, URL As String)
    IE.navigate URL
    Do While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
        DoEvents
    Loop
End Sub
Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • I have just seen your post and i am currently working onto the code – Codo Feb 18 '14 at 18:41
  • YUPPIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IT WORKS!!!! NOW I CAN START BUILDING MY MODEL. Your post is piece of the puzzle number one and the second piece is this http://stackoverflow.com/questions/21413004/vba-pull-xml-data-to-excel – Codo Feb 18 '14 at 18:46
  • These were the completion of a two year effort to learn to code from scratch now i can learn at my pace VBA!!! :-))))))) – Codo Feb 18 '14 at 18:47
  • +1 nicely done. However @Codo as a side note, you have to be very careful with this technical approach. Web scraping is generally considered bad practice (in some companies explicitly forbidden), because changes to the structure of the web pages may result in bugs. I'm also not sure why you need web scraping here. I would consider making a direct connection to the web service via ajax over http or by using ftp, if such a connection can be set up). This way you can skip the step where you crawl the front end webpage, and immediately get the data from the server via the web service. – html_programmer Feb 19 '14 at 15:31
  • Wow although i believe this approach will fruit just let me check on that! If there is a faster way what is the level of capability on FTP or AJAX and can it work in tandem with Excel? Do you believe it would work faster with Excel? Keep in mind i only know VBA...\ Sorry i came from the mountain... – Codo Feb 19 '14 at 15:49
  • Well i checked it and i believe it won't work! I'll tell you why. Because according to this http://www.howcast.com/videos/396567-How-to-Use-FTP-to-Access-EDGAR you must do a hoist of things... I want a cell filled with a ticker symbol and a button (the basic chassis) since the macro must fire up more than 2k times per day. Imagine doing all these moves in the video instead of a simple click...at least for FTP – Codo Feb 19 '14 at 15:54
  • I don't know what kind of animal is the AJAX but i will check on it. If you have any recommendation please share. – Codo Feb 19 '14 at 15:55
  • Logically since it's for personal use and the changes to Sec.gov website won't be that thrilling the next decades if the runtime is reasonable maybe the plan will go through. Anyone knowing about this plz comment – Codo Feb 19 '14 at 15:58
  • Ajax is an asynchronous request that you send to the server. It's nothing more than a non-blocking HTTP request, meaning that it runs in the background ("asynchronously"). – html_programmer Feb 19 '14 at 16:21
  • Check the code that I provided as an answer to the following question: http://stackoverflow.com/questions/18835640/updating-oracle-table-from-excel-vba-macro-using-odbc-connection/18892448#18892448 I worked with json objects for this example, but you can replace this with an xml parser. You don't "have to" use this approach, but I would choose it over web scraping and crawling the DOM any time when possible, because I feel that its relatively unreliable. What if the "Documents" is changed to something else? When you request data immediately, you don't have to deal with these problems. – html_programmer Feb 19 '14 at 16:22
  • So AJAX never stops running? Doesn't this blow away the memory? Something constant on running time... – Codo Feb 19 '14 at 21:20