0

I'm having a difficult time scraping the elements from a webpage. The webpage source looks something like this

<div class="tb-react-data-grid">
  <div class="tb-react-dg-hrow">
    <div class="tb-react-dg-body">
      <div class="tb-react-dg-bsection">
      <div class="tb-react-dg-bsection">

Under each "tb-react-dg-bsection", there are multiple "tb-react-dg-brow" div's, and then finally 4 of these that I need:

<div class="tb-react-dg-bcell" data-tb-test-id="somethingNeeded#1">
<div class="tb-react-dg-bcell" data-tb-test-id="somethingNeeded#2">
<div class="tb-react-dg-bcell" data-tb-test-id="somethingNeeded#3">
<div class="tb-react-dg-bcell" data-tb-test-id="somethingNeeded#4">

I'm trying to grab the displayed text in the "tb-react-dg-bcell" items (somethingINeeded items). I've tried different approaches, but so far it's only grabbing the text in the first 40 or so rows. When you scroll down the webpage, another

<div class="tb-react-dg-bsection">

appears when I'm viewing with F12 in the browser. Here's what I've got so far.

Public Sub Scrape()
    
    Dim ie As InternetExplorerMedium: Set ie = New InternetExplorerMedium
    Dim html As HTMLDocument
    Dim cRow As Long, source As Object

    cRow = 1

    With ie
        .Visible = True
        .Navigate "http://testwebsite.com"
    End With

    Do While ie.ReadyState <> READYSTATE_COMPLETE
        DoEvents
    Loop
    
        Set Html = ie.Document
        
    Set mtbl = Html.getElementsByClassName("tb-react-dg-bcell")
        
    For Each source In mtbl
        Sheets(1).Cells(cRow, 1) = source.textContent
        cRow = cRow + 1
    Next source

    ie.Quit
    Set ie = Nothing
    
End Sub

Any suggestions on how to grab the remaining rows would be greatly appreciated!

Edit: adding screenshot

enter image description here

QHarr
  • 83,427
  • 12
  • 54
  • 101
Sudio
  • 153
  • 1
  • 9
  • Is the website public? – QHarr May 19 '21 at 22:07
  • No I'm afraid not. I can post some portions though...what parts would be helpful to see? – Sudio May 20 '21 at 00:13
  • I posted some HTML from the site, hopefully this helps – Sudio May 20 '21 at 12:15
  • Have you tried using executeScript to scroll down the page or any other scrolling method? Have you confirmed whether only what is currently in viewport can be grabbed? – QHarr May 20 '21 at 13:45
  • Yes, I've tried using the method posted here: https://stackoverflow.com/questions/48275399/unable-to-scroll-a-split-screen-of-a-webpage , and it does scroll to the bottom. However, then my code just grabs the ones at the bottom. I'll see if I can't attach a screenshot to the original post – Sudio May 20 '21 at 14:01
  • 1
    `just grabs the ones at the bottom.` ^^ see last part of my previous comment. – QHarr May 20 '21 at 21:06

1 Answers1

0

Here's is one solution I've found that works. It's probably not the most elegant or efficient, but does the job:

Public Sub Scrape2()

    Dim ie As InternetExplorerMedium: Set ie = New InternetExplorerMedium
    Dim Html As HTMLDocument
    Dim cRow As Long, cCol As Integer, source As Object
    Dim scrollHeight As Variant, counter As Long
    counter = 0
    cRow = 1

    With ie
        .Visible = True
        .Navigate "https://websitename.com/subsite/blah"
    End With

    Do While ie.ReadyState <> READYSTATE_COMPLETE
        DoEvents
    Loop
    
    Application.Wait Now + #12:00:10 AM#
    
    Set Html = ie.Document
        
    Dim leftWindowColl As IHTMLElementCollection
    Set leftWindowColl = Html.getElementsByClassName("tb-react-dg-body")
    
    If leftWindowColl.Length > 0 Then
        Dim leftWindowDiv As HTMLDivElement
        Set leftWindowDiv = leftWindowColl.Item(0)
        scrollHeight = leftWindowDiv.scrollHeight
    End If
    
    Do While counter < scrollHeight
        
        Set mtbl = Html.getElementsByClassName("tb-react-dg-brow")
    
        For Each source In mtbl
            Sheets(1).Cells(cRow, 1) = source.Children(1).textContent
            Sheets(1).Cells(cRow, 2) = source.Children(2).textContent
            Sheets(1).Cells(cRow, 3) = source.Children(3).textContent
            Sheets(1).Cells(cRow, 4) = source.Children(4).textContent
            cRow = cRow + 1
        Next source

        counter = counter + 2000
        leftWindowDiv.scrollTop = counter
        Application.Wait Now + #12:00:02 AM#
    
    Loop
    
    ie.Quit
    Set ie = Nothing

End Sub
Sudio
  • 153
  • 1
  • 9
  • Can you please share the website link where from the data is being scrapped. It would be great help to learn. @Sudio –  May 20 '21 at 17:20
  • 1
    I'd like to, but it's not publicly accessible – Sudio May 20 '21 at 17:24