-1

I need help scraping the tags onto my excel from an internal company website.

This is the source code.

<br />
<span class="RptTitle"><input id="chkPromisDataLog" type="checkbox" name="chkPromisDataLog" checked="checked" onclick="showOnOffPromisLog();" /><label for="chkPromisDataLog">Promis Processing data log [83508442.1].</label></span>
<div id="divPromisDataLog" style="display: none;">
    <table id="tblPromisDataLog" cellspacing="0" cellpadding="0" width="100%" border="0" class="table">
        <tr>
            <td width="60%"></td>
            <td>
                <a class="textnormal" href="javascript:popwnd=window.open('../Tools/ExportExcel.aspx?KEY=LOT_GEN_PROMIS','popwnd','status=no,toolbar=Yes,menubar=Yes,location=no,scrollbars=yes,resizable=Yes');popwnd.focus()">
                    Export to Excel
                </a>
            </td>
        </tr>
        <tr>
            <td colspan="2">
                <table cellspacing="0" rules="all" border="1" id="dgPromisDataLog" style="border-color: Black; border-collapse: collapse;">
                    <tr class="rptDetailsHeaderMgt" align="center">
                        <td>LotID</td>
                        <td>Hist Stage</td>
                        <td>Datein</td>
                        <td>Dateout</td>
                        <td>Qtyin</td>
                        <td>Qtyout</td>
                        <td>M/C ID</td>
                        <td>Emp TrackOut</td>
                        <td>Hold Code</td>
                        <td>Hold Reason</td>
                        <td>Staging (Hrs)</td>
                    </tr>
                    <tr class="rptDetailsItemMgt" align="center" style="white-space: nowrap;">
                        <td>83508442.1</td>
                        <td>
                            <a
                                href="javascript:popwnd=window.open('LotGen_Dtl.aspx?iDate=04/09/2021 09:07:07 PM&amp;amp;oDate=04/10/2021 03:47:59 PM&amp;amp;oLotid=83508442.1&amp;amp;oStage=C-WFRPROCS&amp;amp;oLastRow=N','popwnd','width=900,height=600,status=no,toolbar=no,menubar=no,location=no,scrollbars=yes,top=100,right=50,left=50');popwnd.focus();"
                            >
                                C-WFRPROCS
                            </a>
                        </td>
                        <td>4/9/2021 9:07:07 PM</td>
                        <td>4/10/2021 3:47:59 PM</td>
                        <td>0</td>
                        <td>9</td>
                        <td></td>
                        <td>10911700</td>
                        <td>&nbsp;</td>
                        <td>&nbsp;</td>
                        <td>18.68</td>
                    </tr>
                </table>
            </td>
        </tr>
    </table>
</div>

This is roughly my code

Sub Lotsearch()

    Dim ie As InternetExplorer
    Dim htmlEle As IHTMLElement
    Dim i As Integer
    
    Set ie = New InternetExplorer 'start new IE page
    ie.Visible = True 'View what is happening in IE
    ie.navigate "www.internalcompanywebsite.aspx" 'Open link in IE
    
    While ie.readyState <> 4 'Waits for IE to finish loading
        DoEvents
    Wend
    
        i = 1
    
    'ie.document.getElementById("tblPromisDataLog") = Cells(2, 1).Value
    
    'ie.document.getElementsByTagName("td").Value = Cells(5, 1).Value
    
    'Set Data = ie.document.getElementByTagName("rptDetailsItemMgt")
    'Dim myValue As String
    'myValue = allRowOfData.Cells(0).innerHTML
    
    'Cells(3, 13) = myValue
    'Range("L1").Value = myValue

    'For Each htmlEle In ie.document.getElementById("tblPromisDataLog")(0).getElementsByClassName("rptDetailsItemMgt")

        With ActiveSheet
            .Range("A" & i).Value = htmlEle.Children(0).textContent
        '    .Range("B" & i).Value = htmlEle.Children(1).textContent
        '    .Range("C" & i).Value = htmlEle.Children(2).textContent
        '    .Range("D" & i).Value = htmlEle.Children(3).textContent
        '    .Range("E" & i).Value = htmlEle.Children(4).textContent
        '    .Range("F" & i).Value = htmlEle.Children(5).textContent
        '    .Range("G" & i).Value = htmlEle.Children(6).textContent
        '    .Range("H" & i).Value = htmlEle.Children(7).textContent
        '    .Range("I" & i).Value = htmlEle.Children(8).textContent
        '    .Range("J" & i).Value = htmlEle.Children(9).textContent
        '    .Range("K" & i).Value = htmlEle.Children(10).textContent
        '    .Range("L" & i).Value = htmlEle.Children(11).textContent
            
        End With
        
        i = i + 1
    
    Next htmlEle
        
    ie.Quit

    
End Sub

As you can see, I have tried various methods but to no avail.

getElementbyID not working

getElementsbyTagName not working

getElementsByClassName not working

Any help would be appreciated. Thanks.

QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Instead of reinventing the wheel just use a well worn library. You can join a group of people here who are all trying to do the same thing: https://github.com/VBA-tools/VBA-Web – HackSlash May 26 '21 at 16:13
  • Does this answer your question? [Excel VBA - Web Scraping - Inner Text of HTML Table Cell](https://stackoverflow.com/questions/63738093/excel-vba-web-scraping-inner-text-of-html-table-cell) – HackSlash May 26 '21 at 16:14
  • What is the desired output format and values? – QHarr May 26 '21 at 20:59
  • getElementById returns a single node so there is no indexing i.e. no (0) – QHarr May 26 '21 at 21:02
  • 83508442.1 C-WFRPROCS 4/9/2021 9:07:07 PM 4/10/2021 3:47:59 PM 0 9 10911700 18.68 I want the output like this as in a table format. I just realised that I should use "dgPromisDataLog" for my getElementbyId ? – FireBlazen May 27 '21 at 15:50

2 Answers2

1

it may not actually be the most efficient way to deal with HTML extraction, but you might consider using Regex matching.. Raw Coding on youtube just made a killer regex tutorial, and I remembered seeing this question, and thought it might be a good alternative if you didn't like dealing with html explicitly.

Regex Tutorial for Beginners from Raw Coding on Youtube

like, if you only wanted normal text between td tags, you could regex search for

(?<OpenTag>[\<]+td[\>]+)(?<Contents>[\w\/\(\)\[\]\.\&\:\;\s]*?)(?<CloseTag>[\<]+[\/]+[td]+[\>]+)

here's an example at Regex101

Regex101 example using your html

0
Dim ht As HTMLDocument
Dim i As Integer
Dim htmltable As MSHTML.htmltable

Set htmltable = ht.getElementById("dgPromisDataLog")
    
    myValue = htmltable.getElementsByClassName("rptDetailsItemMgt")(0).getElementsByTagName("td")(0).innerText

After messing with it for a few days, I found that the code works if I split up the getElementbyId from the other 'getElements'.

Changed htmlEle As IHTMLElement into ht As HTMLDocument. Also added htmltable As MSHTML.htmltable

For some reason the code returns an error if I chain the entire 'getelement' together. Hope this helps someone else with the same problem.