2

Edit: More information - the objective of this program is to pull from an existing list of names, search the website, and bring back the corresponding NPI numbers. Thanks to user @omegastripes I was advised to shift my focus to XHR. My question is regarding, how to populate the search with the names of the providers, and loop so that it will return the NPI's in the next cells over in the spread sheet for the remaining providers.

Related, what to do in the event nothing populates from the search

original post: Title - Do you want to continue? Internet Explorer pop up - VBA

Internet Security pop up prevents my code from continuing. Normally I would disable this request but my computer security access is limited due to using a work computer.

My question, is there a way to click "Yes" on this pop up using VBA?

Here is my code so far.

Sub GetNpi()

Dim ie As Object

'create a new instance of ie
Set ie = New InternetExplorer
ie.Visible = True

'goes to site
ie.navigate "npinumberlookup.org"
Do While ie.readyState <> READYSTATE_COMPLETE
     Loop

Set ieDoc = ie.document

'select search box last name and Fill in Search Box
ie.document.getElementById("last").Focus
ie.document.getElementById("last").Value = "testlastname"

'select search box first name and Fill in Search Box
ie.document.getElementById("first").Focus
ie.document.getElementById("first").Value = "testfirstname"

Do While ie.readyState <> READYSTATE_COMPLETE
     Loop

'select state drop down box enter TX
ie.document.getElementById("pracstate").Focus
ie.document.getElementById("pracstate").Value = "TX"

'click submit button
ie.document.getElementById("submit").Click

example

Community
  • 1
  • 1
josephkane
  • 31
  • 6
  • I didn't receive a security popup from that site. Can you provide a screen shot of the popup? – K.Dᴀᴠɪs Jan 13 '18 at 06:43
  • Apologies, I included the link but the formatting was incorrect and prevented from showing. Edited above post. – josephkane Jan 13 '18 at 06:47
  • What happens when you keep the checkbox clicked and select yes? – K.Dᴀᴠɪs Jan 13 '18 at 06:48
  • it continues to the page but stops the code from continuing, i am looking for a way to add to my code to click yes. – josephkane Jan 13 '18 at 06:55
  • That dialog box shouldn't come back if you keep the checkbox clicked the next time you run the code. There may be an API call you can use with `PostMessage` from my research. At what point in the code does the popup occur? – K.Dᴀᴠɪs Jan 13 '18 at 06:56
  • When I rerun my code, it opens a new instance of IE so the setting are not saved, it happens at the very end, this is my first code I am writing, working on problems as i come across them.. – josephkane Jan 13 '18 at 06:59
  • Does the popup occur as soon as you navigate to the page - or - once the page is loaded - or - after you submit the form? – K.Dᴀᴠɪs Jan 13 '18 at 07:00
  • as soon as the submit button is pressed – josephkane Jan 13 '18 at 07:01
  • Updated (and undeleted) answer. Forgot to add the constants. – K.Dᴀᴠɪs Jan 13 '18 at 07:25
  • It's not clear from the list of names screenshot, what output should be for request returning multiply entries. – omegastripes Jan 13 '18 at 23:15
  • One second while I retrieve the actual spread sheet in my email – josephkane Jan 13 '18 at 23:23
  • updated post above – josephkane Jan 13 '18 at 23:41
  • My question is still the same: what output should be for request returning multiply entries? Please check e. g. BELL ANNA, BROWN BERNADETTE, GARCIA EUGENIA, and show the expected output on the screenshot of manually filled in worksheet (for that rows). – omegastripes Jan 14 '18 at 01:09
  • Apologies, the state would be Texas which I would hope narrow it down. Perhaps there could be an indicator in the next column with an "x" to signal multiple entries were found, then on those I could check for accuracy. – josephkane Jan 14 '18 at 03:52
  • Please check updated code in [the answer](https://stackoverflow.com/a/48244764/2165759). – omegastripes Jan 14 '18 at 23:55
  • @omegastripes, First - Thank you for your help, and your time you have spent assisting me in this project, and thank you for introducing me to XHR / XML web scraping. I am studying your code in attempt to learn more, can you explain how your 'minor code simplification works and the purpose. – josephkane Jan 15 '18 at 19:46
  • Please check updated answer, added some explanation of the regex patterns and links. – omegastripes Jan 15 '18 at 21:19

1 Answers1

1

Update

Try the below code to retrieve NPI for the names from the worksheet (specify last name, first name and state):

Option Explicit

Sub TestListNPI()

    ' Prefix type + func
    ' Type: s - string, l - long, a - array
    ' Func: q - query, r - result
    Dim i As Long
    Dim j As Long
    Dim k As Long
    Dim sqLN As String
    Dim sqFN As String
    Dim aqFN
    Dim sqSt As String
    Dim arHdr
    Dim arRows
    Dim srMsg As String
    Dim srLN  As String
    Dim srFN As String
    Dim arFN
    Dim lrMNQty As Long
    Dim sOutput As String

    i = 2
    With Sheets(1)
        Do
            sqLN = .Cells(i, 1)
            If sqLN = "" Then Exit Do
            .Cells(i, 4) = "..."
            sqFN = .Cells(i, 2).Value
            aqFN = Split(sqFN)
            sqSt = "" & .Cells(i, 3)
            GetNPIData sqLN, aqFN(0), sqSt, arHdr, arRows, srMsg
            If srMsg = "OK" Then
                With CreateObject("Scripting.Dictionary")
                    For j = 0 To UBound(arRows, 1)
                        Do
                            srLN = arRows(j, 1)
                            If LCase(srLN) <> LCase(sqLN) Then Exit Do ' Last names should match
                            srFN = arRows(j, 3)
                            arFN = Split(srFN)
                            If LCase(arFN(0)) <> LCase(aqFN(0)) Then Exit Do ' First names should match
                            lrMNQty = UBound(arFN)
                            If UBound(aqFN) < lrMNQty Then lrMNQty = UBound(aqFN)
                            For k = 1 To lrMNQty
                                Select Case True
                                    Case LCase(arFN(k)) = LCase(aqFN(k)) ' Full match
                                    Case Len(arFN(k)) = 1 And LCase(arFN(k)) = LCase(Left(aqFN(k), 1)) ' First letter match
                                    Case Len(arFN(k)) = 2 And Right(arFN(k), 1) = "." And LCase(Left(arFN(k), 1)) = LCase(Left(aqFN(k), 1)) ' First letter with dot match
                                    Case Else ' No matches
                                        Exit Do
                                End Select
                            Next
                            .Add arRows(j, 0), arRows(j, 1) & " " & arRows(j, 3)
                        Loop Until True
                    Next
                    Select Case .Count
                        Case 0
                            sOutput = "No matches"
                        Case 1
                            sOutput = .Keys()(0)
                        Case Else
                            sOutput = Join(.Items(), vbCrLf)
                    End Select
                End With
            Else
                sOutput = srMsg
            End If
            .Cells(i, 4) = sOutput
            DoEvents
            i = i + 1
        Loop
    End With
    MsgBox "Completed"

End Sub

Sub GetNPIData(sLastName, sFirstName, sState, aResultHeader, aResultRows, sStatus)

    Dim sContent As String
    Dim i As Long
    Dim j As Long
    Dim aHeader() As String
    Dim aRows() As String

    ' Retrieve HTML content via XHR
    With CreateObject("MSXML2.XMLHTTP")
        .Open "POST", "http://npinumberlookup.org/getResults.php", False
        .SetRequestHeader "content-type", "application/x-www-form-urlencoded"
        .Send _
            "last=" & EncodeUriComponent(sLastName) & _
            "&first=" & EncodeUriComponent(sFirstName) & _
            "&pracstate=" & EncodeUriComponent(sState) & _
            "&npi=" & _
            "&submit=Search" ' Setup request parameters
        sContent = .ResponseText
    End With
    ' Parse with RegEx
    Do ' For break
        With CreateObject("VBScript.RegExp")
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            ' Minor HTML simplification
            .Pattern = "<(?!/td|/tr|/th|td|tr|th|a href)[^>]*>|&nbsp;|\r|\n|\t"
            sContent = .Replace(sContent, "")
            .Pattern = "<a [^>]*href=""([^""]*)"".*?</td>"
            sContent = .Replace(sContent, "$1</td>")
            .Pattern = "<(\w+)\b[^>]+>"
            sContent = .Replace(sContent, "<$1>")
           ' Extract header
            .Pattern = "<tr>((?:<th>.*?</th>)+)</tr>"
            With .Execute(sContent)
                If .Count <> 1 Then
                    sStatus = "No header"
                    Exit Do
                End If
            End With
            .Pattern = "<th>(.*?)</th>"
            With .Execute(sContent)
                ReDim aHeader(0, 0 To .Count - 1)
                For i = 0 To .Count - 1
                    aHeader(0, i) = .Item(i).SubMatches(0)
                Next
            End With
            aResultHeader = aHeader
           ' Extract data
            .Pattern = "<tr>((?:<td>.*?</td>)+)</tr>"
            With .Execute(sContent)
                If .Count = 0 Then
                    sStatus = "No rows"
                    Exit Do
                End If
                ReDim aRows(0 To .Count - 1, 0)
                For i = 0 To .Count - 1
                    aRows(i, 0) = .Item(i).SubMatches(0)
                Next
            End With
            .Pattern = "<td>(.*?)</td>"
            For i = 0 To UBound(aRows, 1)
                With .Execute(aRows(i, 0))
                    For j = 0 To .Count - 1
                        If UBound(aRows, 2) < j Then ReDim Preserve aRows(UBound(aRows, 1), j)
                        aRows(i, j) = Trim(.Item(j).SubMatches(0))
                    Next
                End With
            Next
            aResultRows = aRows
        End With
        sStatus = "OK"
    Loop Until True

End Sub

Function EncodeUriComponent(sText)
    Static oHtmlfile As Object
    If oHtmlfile Is Nothing Then
        Set oHtmlfile = CreateObject("htmlfile")
        oHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"
    End If
    EncodeUriComponent = oHtmlfile.parentWindow.encode(sText)
End Function

The output for me is as follows:

updated code output

For multiply entries all names are output in the last column instead of NPI.

Some explanation of the code. Generally RegEx's aren't recommended for HTML parsing, so there is disclaimer. Data being processed in this case is quite simple that is why it is parsed with RegEx. About RegEx: introduction (especially syntax), introduction JS, VB flavor. Simplification makes HTML code suitable for parsing in some degree. Patterns:

  • <(?!/td|/tr|/th|td|tr|th|a href)[^>]*>|&nbsp;|\r|\n|\t is for removing white-spaces, and all tags but table markup and links by replacing with "".
  • <a [^>]*href="([^"]*)".*?</td> keeps link address by replacing with $1</td>.
  • <(\w+)\b[^>]+> removes all unnecessary tag attributes by replacing with <$1>.
  • <tr>((?:<th>.*?</th>)+)</tr> matches each table header row.
  • <th>(.*?)</th> matches each header cell.
  • <tr>((?:<td>.*?</td>)+)</tr> matches each table data row.
  • <td>(.*?)</td> matches each data cell.

Look into how does the HTML content is changed on each step of replacemnets.

Initial answer

Avoid pop up appearing instead of bothering with it.

Make sure you are using secure HTTP protocol https://npinumberlookup.org.

You may even not use IE for webscraping at all, XHR is better choice, as it is more reliable and fast, though it requires some knowledge and experience. Here is the simple example of that:

Option Explicit

Sub Test()

    Dim sContent As String
    Dim i As Long
    Dim j As Long
    Dim aHeader() As String
    Dim aRows() As String

    ' Retrieve HTML content via XHR
    With CreateObject("MSXML2.XMLHTTP")
        .Open "POST", "http://npinumberlookup.org/getResults.php", False
        .SetRequestHeader "content-type", "application/x-www-form-urlencoded"
        .Send _
            "last=smith" & _
            "&first=michael" & _
            "&pracstate=NC" & _
            "&npi=" & _
            "&submit=Search" ' Setup request parameters
        sContent = .ResponseText
    End With
    ' Parse with RegEx
    Do ' For break
        With CreateObject("VBScript.RegExp")
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            ' Minor HTML simplification
            .Pattern = "<(?!/td|/tr|/th|td|tr|th|a href)[^>]*>|&nbsp;|\r|\n|\t"
            sContent = .Replace(sContent, "")
            .Pattern = "<a [^>]*href=""([^""]*)"".*?</td>"
            sContent = .Replace(sContent, "$1</td>")
            .Pattern = "<(\w+)\b[^>]+>"
            sContent = .Replace(sContent, "<$1>")
           ' Extract header
            .Pattern = "<tr>((?:<th>.*?</th>)+)</tr>"
            With .Execute(sContent)
                If .Count <> 1 Then
                    MsgBox "No header found"
                    Exit Do
                End If
            End With
            .Pattern = "<th>(.*?)</th>"
            With .Execute(sContent)
                ReDim aHeader(0, 0 To .Count - 1)
                For i = 0 To .Count - 1
                    aHeader(0, i) = .Item(i).SubMatches(0)
                Next
            End With
           ' Extract data
            .Pattern = "<tr>((?:<td>.*?</td>)+)</tr>"
            With .Execute(sContent)
                If .Count = 0 Then
                    MsgBox "No rows found"
                    Exit Do
                End If
                ReDim aRows(0 To .Count - 1, 0)
                For i = 0 To .Count - 1
                    aRows(i, 0) = .Item(i).SubMatches(0)
                Next
            End With
            .Pattern = "<td>(.*?)</td>"
            For i = 0 To UBound(aRows, 1)
                With .Execute(aRows(i, 0))
                    For j = 0 To .Count - 1
                        If UBound(aRows, 2) < j Then ReDim Preserve aRows(UBound(aRows, 1), j)
                        aRows(i, j) = .Item(j).SubMatches(0)
                    Next
                End With
            Next
        End With
    Loop Until True
    ' Output
    With ThisWorkbook.Sheets(1)
        .Cells.Delete
         Output2DArray .Cells(1, 1), aHeader
         Output2DArray .Cells(2, 1), aRows
         .Columns.AutoFit
    End With
    MsgBox "Completed"

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

All the data in the code could be easily obtained from browser developer tools on network tab after you click submit, as an example:

network

The above code returns the output for me as follows:

output

omegastripes
  • 12,351
  • 4
  • 45
  • 96
  • holy cow your code is amazing, my intent was to start with a form filled with names, search the site and bring back the NPI numbers. How could i incorporate that to pull from an existing spreadsheet? – josephkane Jan 13 '18 at 22:17
  • @josephkane For extended answer please edit the question and add some screenshots (or links to) showing the example of your initial data on a worksheet and expected output. – omegastripes Jan 13 '18 at 22:21
  • @josephkane have you succeeded with that solution? – omegastripes Feb 07 '18 at 03:45
  • Hi @omegastripes! I am back because I am potentially being considered for a position in my company for this role that I volunteered to assist years ago when I asked for your help. I am still very appreciative. I wanted to impress them with a working version of this, however it appears the website has changed over the years and it no longer works. Can you help me in finding the changes and corrections? If this role becomes a reality I really want to learn everything in this area. Thank you! -J – josephkane Apr 28 '21 at 15:32