0

I have been toying with a VBA project in excel. The idea is that it will scrape a website (specified in the spreadsheet) and pull data from the site. In this example I am pulling playstation data (i.e. Games played, time played, trophies, etc.) and putting them into a spreadsheet. I have a total of 69 games but the scraper only pulls 50.

How do I get it to pull in all 68 (or more in the future?)

I'll admit, I am not very versed with VBA, so any assistance would be greatly appreciated.

My source code:

Sub scrape_quotes()
    Set browser = CreateObject("InternetExplorer.Application")
    
    'Dim browser    As InternetExplorer
    Dim Games       As Object
    Dim Game        As Object
    Dim Num         As Long
    Dim DateLastPlayed As Object
    Dim PlatformType As Object
    Dim BronzeNum As Object
    Dim SilverNum As Object
    Dim GoldNum As Object
    
    MsgBox "Please wait, this may take a few minutes..." & vbNewLine & "Pres OK To Continue", vbInformation, "Game Tracker"
    Application.StatusBar = "Scraping Data. Please wait..."
    
    ' Assigns a cell for the URL
    Dim URL         As String
    URL = ThisWorkbook.Sheets("Scraper").Range("B6").Value
    If Len(URL) = 0 Then Exit Sub
    
    ' Opens "invisible" browser and remains until all data is loaded
    
    'Set browser = New InternetExplorer
    browser.Visible = True
    browser.Navigate URL
    Do While browser.readyState <> 4 Or browser.Busy: DoEvents: Loop
        browser.Document.parentWindow.scroll 0&, 20000&
        
        On Error GoTo ErrHandler
        
        ' Looks for data in the "box" element on website
        Set Games = browser.Document.getElementsByClassName("box")
        Dim GameName As String, Hoursplayed As String
        
        ' Looks for data in the "lastplayed" element on website
        Set DateLastPlayed = browser.Document.getElementsByClassName("lastplayed")
        Dim Lastplayed As String
        
        ' Looks for data in the "platforms" element on website
        Set PlatformType = browser.Document.getElementsByClassName("platforms")
        Dim Platform As String
        
        ' Looks for data in the "bronze" element on website
        Set BronzeNum = browser.Document.getElementsByClassName("bronze")
        Dim Bronze As String
        
        ' Looks for data in the "silver" element on website
        Set SilverNum = browser.Document.getElementsByClassName("silver")
        Dim Silver As String
        
        ' Looks for data in the "gold" element on website
        Set GoldNum = browser.Document.getElementsByClassName("gold")
        Dim Gold As String
        
        ' Assigns which sheet to parse data do
        Dim WS      As Worksheet
        Set WS = ThisWorkbook.Sheets("Games List")
        
        ' Assigns each column used for each category
        Application.ScreenUpdating = False
        For Each Game In Games
            CleanData Game.innerText, GameName, Hoursplayed, Lastplayed, Platform, Bronze, Silver, Gold
            If Len(GameName) Then
                Num = Num + 1
                WS.Cells(1 + Num, 1).Value = GameName
                WS.Cells(1 + Num, 2).Value = Hoursplayed
                WS.Cells(1 + Num, 4).Value = Platform
                
            End If
            
            GameName = "": Hoursplayed = ""
            
        Next
        
        'New code starts here.
        Num = 0
        For Each Line In DateLastPlayed
            If Len(Line) Then
                Num = Num + 1
                WS.Cells(1 + Num, 3).Value = Line.innerText
            End If
            
        Next
        
        Num = 0
        For Each Line In PlatformType
            If Len(Line) Then
                Num = Num + 1
                WS.Cells(1 + Num, 4).Value = Line.innerText
            End If
            
        Next
        
        Num = 0
        For Each Line In BronzeNum
            If Len(Line) Then
                Num = Num + 1
                WS.Cells(1 + Num, 5).Value = Line.innerText
            End If
            
        Next
        
        Num = 0
        For Each Line In SilverNum
            If Len(Line) Then
                Num = Num + 1
                WS.Cells(1 + Num, 6).Value = Line.innerText
            End If
            
        Next
        
        Num = 0
        For Each Line In GoldNum
            If Len(Line) Then
                Num = Num + 1
                WS.Cells(1 + Num, 7).Value = Line.innerText
            End If
            
        Next
        
ErrHandler:
        If Err.Number = 0 Then Debug.Print Err.Number & vbNewLine & Err.Description
        Application.ScreenUpdating = True
        browser.Quit
        
        Set browser = Nothing
        MsgBox "Game Data Has Been Scraped!", vbExclamation, "Game Tracker"
        Application.StatusBar = False
        Sheets("Games List").Activate
        
    End Sub

This is the URL I have it pulling data from:

I've tried forcing IE to scroll to the bottom of the page before completing, I tried adding a wait timer, all to no avail.

Vishera
  • 127
  • 1
  • 1
  • 8
  • Have you ever opened the page manually in IE? It no longer works in it. IE is obsolete. – Zwenn Jan 29 '23 at 13:03
  • Yeah, the code opens it in IE and will scroll down to the bottom etc. It works but only pulls data up to 50 rows and no more. From my understanding Excel only works with IE and can't for example, load the page using firefox etc. – Vishera Jan 29 '23 at 17:27
  • The JS for displaying the AJAX content when scrolling down no longer works in IE. It is too old. So you can only reach a maximum of 50. You can control Chrome or Edge via Seleniumbasic from VBA. Have a look at the accepted answer to this question: https://stackoverflow.com/questions/69118936/vba-script-to-convert-from-internet-explorer-to-edge-or-chrome-browser – Zwenn Jan 29 '23 at 20:39
  • so by using a different browser, it should load more than 50 rows? Also, by using selenium, will that mess with any Edge settings I have? I require it to work as I have it setup for my work. – Vishera Jan 30 '23 at 00:02
  • With Seleniumbasic, a browser is only automated. You can also use Chrome, you just have to use the Chrome Web Driver. The JS to retrieve more than 50 lines should work in both browsers. But first get used to working with Selenium. Try to retrieve a single value from a page, or something like that. Keep it simple. Then learn more about how to use Seleniumbasic. For more info, use sources like this: https://www.youtube.com/watch?v=s3Bxb0wthqI – Zwenn Jan 30 '23 at 08:31
  • So I followed some selenium tutorials and managed to scrape data from the site, but it still only collects 50 lines, it always hangs up on that same spot. hmm – Vishera Jan 30 '23 at 20:25

1 Answers1

0

I just glanced at your post, and I don't know exactly what you are after here, but try to implement the code below to get what you want.

Sub WebData()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim source As Object

http.Open "GET", "https://www.exophase.com/psn/user/AeldWulf/", False
http.send
html.body.innerHTML = http.responseText

Debug.Print html.body.innerHTML

RowCount = 1
   
With Sheets("Sheet1")
   .Cells.ClearContents
   For Each itm In html.body.all
      .Range("A" & RowCount) = itm.tagName
      .Range("B" & RowCount) = itm.ID
      .Range("C" & RowCount) = itm.className
      .Range("D" & RowCount) = Left(itm.innerText, 1024)
      RowCount = RowCount + 1
   Next itm
End With

End Sub

Set references to XML and HTML.

enter image description here

After the data was dumped into the Worksheet, I found gold, silver, hours, and other criteria that you listed. Just parse the data dump to clean up the results.

enter image description here

ASH
  • 20,759
  • 19
  • 87
  • 200
  • Here's a screenshot of what I am going for. What I have so far works, but for some reason stops after 50 entries, rather than the full 69 games I have. https://ctrl.vi/i/0aBQ9fJ3g – Vishera Jan 29 '23 at 17:28
  • Did you try what I suggested? I tried it and it worked for me. – ASH Jan 29 '23 at 21:00
  • I just did. It seems to have pulled *everything* so I need to look through it and figure out how to clean it up so it only grabs the info I need. Thanks, I'll play around with it. I only need the game name, time played, last date played, and number of each trophy type . – Vishera Jan 29 '23 at 23:11