3

I'm very new to VBA and HTML/XHTML, but through online research and help from other wonderful members on here I've managed to write a code to pull the data I want. I had a hard time identifying the IDs of the elements I want since it's in XHTML, so I think that's where I've botched it the most.

The website: http://www.usbanklocations.com/banks.php?q=&ct=&ml=30&lc=

Here is what I want the code to do: Pull Bank Name, Address, Phone Number, Total Deposits and Total Assets -- GIVEN the bank name and city I provide in my excel sheet.

Here is my code:

Private Declare Sub Sleep Lib "kernel32.dll" (ByVal dwMilliseconds As Long)
Sub CommunityBanks()
    Dim IE As Object, TableResults As Object, webRow As Object, BankName As Variant, page As Long, pageTotal As Long, r As Long
    Dim beginTime As Date, i As Long, myvalue As Variant

Set IE = CreateObject("internetexplorer.application")
IE.navigate "http://www.usbanklocations.com/banks.php?name=" & Range("A2").Value & "+Bank&ml=30&lc=" & Range("B2").Value & "%2C+TX"
IE.Visible = True

Do While IE.Busy Or IE.readystate <> 4   '4 = READYSTATE_COMPLETE
    DoEvents
Loop

'input bank name into form
'myvalue = InputBox("Enter City. Press okay to begin search", "Bank Search")
'Range("F3").Value = myvalue
'IE.document.getelementbyid("MainContent_txtCity").Value = "LegacyTexas"
'click find button
'IE.document.getelementbyid("MainContent_btn").Click
'Sleep 5 * 1000
IE.document.getelementbytagname("table").getelementsbyclassname("btn").Click
Sleep 5 * 1000

'total pages
pageTotal = IE.document.getelementbyid("lsortby").innertext
page = 0

Do Until page = pageTotal
    DoEvents
    page = IE.document.getelementbyclassname("lsortby").innertext
    With IE.document.getelementbyid("main")
        For r = 1 To .Rows.Length - 1
            If Not IsArray(BankName) Then
                ReDim BankName(7, 0) As Variant
            Else
                ReDim Preserve BankName(7, UBound(BankName, 2) + 1) As Variant
            End If

            BankName(0, UBound(BankName, 2)) = .Rows(r).Cells(0).innertext
        Next r
    End With

    If page < pageTotal Then
        IE.document.getelementbyclassname("panelpn").Click
        beginTime = Now
        Application.Wait (Now + TimeValue("00:00:05"))
    End If
Loop

For r = 0 To UBound(BankName, 2)
    IE.navigate "http://www.usbanklocations.com/" & BankName(0, r)
    Do While IE.Busy Or IE.readystate <> 4   '4 = READYSTATE_COMPLETE
        DoEvents
    Loop
    'wait 5 sec. for screen refresh
    Sleep 5 * 1000

    With IE.document.getelementbytagname("table")
        For i = 0 To .Rows.Length - 1
            DoEvents
            Select Case .Rows(i).Cells(0).innertext
            Case "Name:"
                BankName(1, r) = .Rows(i).Cells(1).innertext
            Case "Location:"
                BankName(2, r) = .Rows(i).Cells(1).innertext
            Case "Phone:"
                BankName(3, r) = .Rows(i).Cells(1).innertext
            Case "Branch Deposit:"
                BankName(4, r) = Replace(Replace(.Rows(i).Cells(1).innertext, ",", ""), "$", "")
            Case "Total Assets:"
                BankName(5, r) = Replace(Replace(.Rows(i).Cells(1).innertext, ",", ""), "$", "")
            End Select
        Next i
    End With
Next r


IE.Quit
Set IE = Nothing

'post result on Excel cell
Worksheets(1).Range("A9").Resize(UBound(BankName, 2) + 1, UBound(BankName, 1) + 1).Value = Application.Transpose(BankName)
End Sub

Thank you in advance! I would greatly appreciate any help.

omegastripes
  • 12,351
  • 4
  • 45
  • 96
K.K.
  • 49
  • 8
  • 1
    The [ToS for usbanklocations.com](http://www.usbanklocations.com/terms-of-use.php) states that users can not `aggregate, copy or duplicate content on USBANKLOCATIONS.COM` - so I'm pretty sure you shouldn't be scraping their site anyway... – SierraOscar Jun 13 '16 at 19:24
  • by "on", they are referring to actions to their site specifically. Not the content users can use. You can copy/paste the information. – K.K. Jun 13 '16 at 19:51
  • OK - I generally don't participate in scraping questions just to err on the side of caution. I was only pointing out in case you weren't aware but if you're happy that it's fine then fair enough. – SierraOscar Jun 13 '16 at 20:02
  • Thank you @MacroMan! I appreciate you pointing that out. – K.K. Jun 13 '16 at 20:12
  • 2
    `.getelementbyvalue`, `.getelementbyclass`, `.getelementbytag` are not valid methods. `.getElementsByClassName`, `.getElementsByTagName` return collection of nodes selected by class and tag names. There is no native function to retrieve a node by it's value. – omegastripes Jun 13 '16 at 21:12
  • Hi @omegastripes, thanks for that. So, if I correct for the "Search" button for example, would this be correct? `IE.document.getelementbyClassName("btn").Click` I'm not sure if the "Search" button is a tag or not. One of my issues here, distinguishing what's a tag and what's not. Specifically when I get to an HTML table. I don't know how to get the elements there, how to call the objects. Thanks for any help :) – K.K. Jun 13 '16 at 22:40
  • @K.K. There is no method in `IE.document` called `getelementbyvalue`. You need to open your IE dev tool (press `F12`) and find out the tag name of the `search` button. Then, use `getelementsbytagname` to get every possible elements and filter them. – PaichengWu Jun 14 '16 at 08:11
  • @pcw, hi. Yes, I had found the the tags for the `search` button, which is an `input` tag, but I don't need it, since I will be giving the the bank name and city already. There would be no need to press `search`. Am I interpreting that sequence correctly? @pcw, I want to total the page like your previous examples, in order to run the loop, but I'm having trouble doing that. I updated my original question with what I think are the right tags -- but I think `lsortby` is not the right element that sums the page. Also, what I am retaining is the BankName for when second loop occurs at r=0 – K.K. Jun 14 '16 at 18:23
  • Hi @pcw. Anyway we can apply the credit union structure to this website, too? – K.K. Jun 15 '16 at 03:33
  • @K.K. In this case, there is no ID for total page. So, you must find out the tag for the total page. Then use `getelementsbytagname` to get all possible objects. And use `For Each` statement and `If ... Then` statement to filter them to get what you want. – PaichengWu Jun 15 '16 at 10:08

1 Answers1

2

Consider the below example which uses XHR instead of IE and split-based HTML content parsing:

Option Explicit

Sub Test_usbanklocations()

    Dim oSource, oDestination, y, oSrcRow, sName, sCity, sDist, sUrl0, sUrl1, sUrl2, lPage, sResp1, sResp2, i, a1, a2, a3, a4, a5

    Set oSource = Sheets(1)
    Set oDestination = Sheets(2)
    oDestination.Cells.Delete
    DataOutput oDestination, 1, Array("Name", "Location", "Phone", "Total Assets", "Total Deposits")
    y = 2

    For Each oSrcRow In oSource.UsedRange.Rows
        sName = oSrcRow.Cells(1, 1).Value
        sCity = oSrcRow.Cells(1, 2).Value
        sDist = oSrcRow.Cells(1, 3).Value
        sUrl0 = "http://www.usbanklocations.com/banks.php?q=" & EncodeUriComponent(sName) & "&lc=" & EncodeUriComponent(sCity) & "&ml=" & sDist
        sUrl1 = sUrl0
        lPage = 1
        Do
            sResp1 = GetXHR(sUrl1)
            If InStr(sResp1, "We can not find the address you provided. Please check.") > 0 Then Exit Do
            a1 = Split(sResp1, "<div class=""pl")
            For i = 1 To UBound(a1)
                a2 = Split(a1(i), "</div>", 3)
                a3 = Split(a2(1), "<a href=""", 2)
                a4 = Split(a3(1), """>", 2)
                sUrl2 = "http://www.usbanklocations.com" & a4(0)
                sResp2 = GetXHR(sUrl2)
                a5 = Array( _
                    GetFragment(sResp2, "<b>Name:</b></td><td>", "</td>"), _
                    Replace(GetFragment(sResp2, "<b>Location:</b></td><td>", "</td>"), "View Other Branches", ""), _
                    GetFragment(sResp2, "<b>Phone:</b></td>", "</td>"), _
                    GetFragment(sResp2, "<b>Total Assets:</b></td><td>", "</td>"), _
                    GetFragment(sResp2, "<b>Total Deposits:</b></td><td>", "</td>") _
                )
                DataOutput oDestination, y, a5
                y = y + 1
                DoEvents
            Next
            If InStr(sResp1, "Next Page &gt;") = 0 Then Exit Do
            lPage = lPage + 1
            sUrl1 = sUrl0 & "&ps=" & lPage
            DoEvents
        Loop
    Next

    MsgBox "Completed"

End Sub

Function GetXHR(sUrl)

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", sUrl, False
        .Send
        GetXHR = .ResponseText
    End With

End Function

Sub DataOutput(oSht, y, aValues)

    With oSht.Cells(y, 1).Resize(1, UBound(aValues) + 1)
        .NumberFormat = "@"
        .Value = aValues
    End With

End Sub

Function GetFragment(sText, sPatt1, sPatt2)

    Dim a1, a2

    a1 = Split(sText, sPatt1, 2)
    If UBound(a1) <> 1 Then Exit Function
    a2 = Split(a1(1), sPatt2, 2)
    If UBound(a2) <> 1 Then Exit Function
    GetFragment = GetInnerText(a2(0))

End Function

Function EncodeUriComponent(sText)

    Static objHtmlfile As Object

    If objHtmlfile Is Nothing Then
        Set objHtmlfile = CreateObject("htmlfile")
        objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"
    End If
    EncodeUriComponent = objHtmlfile.parentWindow.encode(sText)

End Function

Function GetInnerText(sText)

    With CreateObject("htmlfile")
        .Write ("<body>" & sText & "</body>")
        GetInnerText = .DocumentElement.Document.GetElementsByTagName("body")(0).InnerText
    End With

End Function

As an example, the first worksheet contains data to search (Bank name, Location and Distance to refine by):

source

Then result on the second worksheet is as follows:

result

omegastripes
  • 12,351
  • 4
  • 45
  • 96
  • you are awesome @omegastripes! This XHR/api method is a wonderful foundation. Thank you very much. I'm just getting familiar with XHR actually, this would be my first code to look at in this format. I noticed that it's much faster for large sets of data. Thank you so much. – K.K. Jun 16 '16 at 05:38
  • @K.K. BTW making XHRs asynchronous you can achieve even higher speed, but the code should work with events then. – omegastripes Jun 16 '16 at 09:48
  • @omegastripes, thanks for your code. It is a brand new skill to me. I learned from this. – PaichengWu Jun 16 '16 at 11:18
  • Thank you to you both @omegastripes and pcw I really appreciate it. I'm messing around with the code now. The speed is phenomenal – K.K. Jun 16 '16 at 21:51
  • Thank you @pcw. You were great help – K.K. Jun 16 '16 at 21:51
  • @omegastripes, could you please give me hint about `objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"` ? – PaichengWu Jun 17 '16 at 08:31
  • @pcw, any Query String Parameters being sent via XHR GET or Data Payload via XHR POST should be encoded as a valid component of a Uniform Resource Identifier (URI), in that particular case `htmlfile` ActiveX is used as JScript host, allowing to execute JScript code, and `.parentWindow.execScript` creates a function, which is just a wrapper for invoking a native JScript `encodeURIComponent()` from VBA code. You can use another solutions i. e. [by the link](http://stackoverflow.com/questions/218181/how-can-i-url-encode-a-string-in-excel-vba) – omegastripes Jun 17 '16 at 11:50
  • @omegastripes, thanks for your explain. I do like your code and [El's code](http://stackoverflow.com/a/28923996/6202343) on that post. – PaichengWu Jun 17 '16 at 14:58
  • @omegastripes, why do you use `static` in `EncodeUriComponent` function, but not `dim`? If I want to speed the usbank code up, could you please give me some hint? – PaichengWu Jun 17 '16 at 15:06
  • @pcw, `Static` allows to create `htmlfile` instance only once at the first call. That makes further calls faster. Note, that `ScriptControl` ActiveX available in 32-bit version only, so it won't work on 64-bit Office. If you want to speed the code, first of all make second level XHRs `sResp2 = GetXHR(sUrl2)` mode asynchronous, add events hadling. Generally that should be another one question on SO. Try some code, and create new question if you have issues. – omegastripes Jun 17 '16 at 17:17
  • @omegastripes, I never know `ScriptControl` have that issue. Thank you very much. – PaichengWu Jun 17 '16 at 23:26