3



I'm trying to scrape a website with xmlhttp and VBA.

The Url has a hash(#) symbol and is misinterpreted by VBA...

This is the same problem as explained here : Error in XMLHTTP Get request with Special character in URL
The code is as follow :

Sub webscraping()
Dim x               As String
Dim req             As MSXML2.XMLHTTP60
Dim doc             As HTMLDocument

x = "https://search.gleif.org/#/search/"

Set req = New MSXML2.XMLHTTP60
    With req
        .Open "GET", x, False
        .send
        If .Status <> 200 Then
            MsgBox "Http Request Error"
            Exit Sub
        End If
        Set doc = New MSHTML.HTMLDocument
        doc.body.innerHTML = .responseText
    End With
End Sub

Thank you in advance

quicksilv86
  • 105
  • 11

1 Answers1

1

You can side step this by calling the API direct with your search term as shown below. I am using a json parser to handle the json response. After adding the .bas at the link to your project, also go VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.

Note there are queryString parameters you can alter here: "&page%5Bnumber%5D=1&page%5Bsize%5D=15". This is currently with page=1 and records per page at 15.

VBA:

Option Explicit
Public Sub webscraping()
    Dim url As String
    Dim req As MSXML2.XMLHTTP60
    Dim doc As HTMLDocument
    Dim json As Object
    Const TERM = "banana corp."

    url = "https://api.gleif.org/api/v1/lei-records?filter%5Bfulltext%5D=" & Application.EncodeURL(TERM) & "&page%5Bnumber%5D=1&page%5Bsize%5D=15"

    Set req = New MSXML2.XMLHTTP60
    With req
        .Open "GET", url, False
        .send
        If .Status <> 200 Then
            MsgBox "Http Request Error"
            Exit Sub
        End If
        Set json = JsonConverter.ParseJson(.responseText)
    End With
    Stop
    'Do something with json
End Sub

There is a lot of info returned so here is just an extract from it:

Research the API here:

https://www.gleif.org/en/lei-data/gleif-lei-look-up-api/access-the-api

QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Excellent of you @QHarr ! Sorry to interupt this thread, might I ask how you can extract information from the `Set Json = JsonConverter.ParseJson(.responseText)`. I did a `For Each item In Json` and used `Set Json = JsonConverter.ParseJson(.responseText)("data")`, but is unable to get for example: `"legalName":{"name":` or `"status":"ACTIVE"`. Any idea where I can start? I have tried to follow one of your posts to [extract json field](https://stackoverflow.com/questions/52866031/extracting-specific-json-field-from-responsetext-to-single-excel-cell) but with no luck: – Wizhi Jun 17 '19 at 13:48
  • 1
    Hi - I will post a pastebin later - just chase me this evening for it – QHarr Jun 17 '19 at 13:59
  • Hi, you are too nice :)!. I have progressed a little bit. When I know the LEI code https://leilookup.gleif.org/api/v2/leirecords?lei=5299008TNNMXDKX6JB96, I use the following site to get the structure outlined: (http://json.parser.online.fr/) and then I use `Debug.Print item("Entity")("EntityStatus")("$")`. // Still trying to work with your code when I'm searching for the company name and want to retrieve the LEI code. Not sure how I should retrieve the `item` in a `For Each...` Loop then, since the structure seem more complex... – Wizhi Jun 17 '19 at 19:17
  • Exactly. HUGE thanks!!! Thanks to you I figured it out. So the code would be, given this [JSON structure](https://i.stack.imgur.com/SSpVl.png), `Dim lei As String` // `lei = Json("data")(1)("attributes")("entity")("legalAddress")("country") 'Country` // `lei = Json("data")(1)("attributes")("entity")("status") 'Entity Status (Active/Not Active)` // `lei = Json("data")(1)("attributes")("entity")("legalName")("name") 'Legal name` // `lei = Json("data")(1)("attributes")("registration")("status")`. – Wizhi Jun 17 '19 at 20:22
  • 1
    well done. Bet you feel more comfortable reading json now :-) – QHarr Jun 17 '19 at 20:23