0

I am trying to build a web scraper to extract the links/table entries from the following page:

https://www.adr.db.com/drwebrebrand/dr-universe/corporate_actions_type_e.html

I then want to import these all into an excel sheet. I want to repeat this each day and import the new entries, so I'm thinking a VBA macro that I just have to click once per day might be best to keep it all in the sheet. Another option I've explored is using Apify, but I'm less familiar with how to use it.

However, my web scraping knowledge is limited and all I've been able to do so far is extract the html source code. The relevant part of that looks like this:

<td><a href="dr_details.html?identifier={{bookmarkedDepositaryReceipt.drId}}" class="btn btn-add-to-portfolio " aria-hidden="true"> View full details </a></td>

Which means the URLs are dynamically generated with Javascript, so I'm not actually able to see the generated links for each entry. How would I go about extracting the actual links?

If it helps, this is my VBA code for extracting the HTML:

Sub Oval1_Click()
Dim result As String
Dim myURL As String
Dim winHttpReq As Object
Set winHttpReq = CreateObject("WinHttp.WinHttpRequest.5.1")

myURL = "https://www.adr.db.com/drwebrebrand/dr-universe/corporate_actions_type_e.html"

winHttpReq.Open "GET", myURL, False
winHttpReq.send

result = winHttpReq.responseText
Range("A1").Value = result

Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Dim oFile As Object
Set oFile = fso.CreateTextFile("DB_source.txt")
oFile.WriteLine result
oFile.Close
Set fso = Nothing
Set oFile = Nothing

End Sub
bazite
  • 314
  • 2
  • 6
  • You're receiving the data you're requesting, but it's only a raw response. Without a browser to interpret the JS included in the response, which then builds out the HTML including the links you're after, this won't work. You can call IE from VBA to navigate to the URL so that you have the JS rendering the HTML from the raw http response, but IE is pretty slow with complicated JS and it doesn't have a reliable "ready" state indicator for programmatic control. Are you able to install/use any other tools? Also, is the button to "export" to xlsx on that page insufficient for your needs? – Word Nerd Dec 11 '22 at 20:16
  • IE should not be used anymore, it is dead. Even if the mentioned page can currently still be called in it. But it doesn't matter, because the hyperlinks are of no use anyway, because the contents of the page are already all loaded. They are just faded in when a link is clicked. The data is in a JSON. This is loaded via POST. I haven't tried it, but probably the SeassionId is necessary to retrieve it. This is in a cookie, which is why WinHTTP is already the correct access technique. If there is no knowledge about POST, JSON and cookiehandling, you can't program the download alone at the moment. – Zwenn Dec 11 '22 at 21:41
  • I agree with @Zwenn re not using IE. Saying you "can" use IE with VBA wasn't an endorsement, only a statement of what's possible with VBA. Selenium would be better for post-paint browser scraping anyway, but to expand on Zwenn's comment, you can extract the cookie/SessionID/CSRF token from the 1st response and send two more requests (with required JSON body) to grab the "results" JSON payloads (search?page=0&size=80 and cash-dividends/search?page=0&size=80). You can learn from VBA examples [here](https://stackoverflow.com/questions/39709562/how-to-set-and-get-jsessionid-cookie-in-vba). – Word Nerd Dec 24 '22 at 21:25

0 Answers0