1

I wrote the following code in order to retrieve data from an API and hopefully get the content of a specific element. I wrote similar code that works with web pages (not API) but in this example it breaks and don't understand why.

Sub parseXML()
Dim xmldoc As Object
Dim obj As Object
Dim MyRequest As Object

Set MyRequest = CreateObject("MSXML2.XMLHTTP")
MyRequest.Open "GET", 
"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=ABL1%20Acute%20granulocytic%20leukemia"     
 MyRequest.send
 While Not MyRequest.readyState = 4
       DoEvents
    Wend
Set xmldoc = MyRequest.responseXML
Set obj = xmldoc.DocumentElement.getElementsByClassName("ui-ncbihistogram-display-area")(0) 'Error: object doesn't support this property or method 
End Sub 

How should I write the last line in order to remove the error? I am trying to get the element class="ui-ncbihistogram-display-area". It has several <li>elements and at the end I want to retrieve the inner text of each of these<li> elements.

QHarr
  • 83,427
  • 12
  • 54
  • 101
David912
  • 378
  • 1
  • 2
  • 11
  • 2
    What data do you want to get from that document? What specific element? Use https://stackoverflow.com/posts/57132271/edit to edit/update the question, and add the information about exactly what data/element you want to get from the document. – sideshowbarker Jul 21 '19 at 11:00

1 Answers1

2

You are trying to use the syntax that would work for an HTML Parser selecting off the pubmed search for the same term i.e. you would be selecting for the histogram on this page.

However, with the API call you are making you are getting an XML document which will have specified fields according the API documentation. Instead, you would use xpath to specify what you want if using xml parser (this will allow you to include more specificity in terms of attributes and parent child relationships). If you are going to use HTML parser then you want to look at selecting by tag. The field present in the webpage, for the histogram, is not present in the response. Nor is that information present in the return xml document. The figure that sort of matches is you get the total result count for the query i.e. 248. Though it is broken down by year in the webpage histogram whereas it is a total in the XML API response.

Sample comparison:

enter image description here

I would recommend using an XML parser for XML. You then get the expected result hopefully. Compare the two parsers here where there is loss of content in the HTMLDocument.

Public Sub DifferentParsing()
    Dim xmlDoc As Object, htmlDoc As HTMLDocument, url As String

    Set xmlDoc = CreateObject("MSXML2.DOMDocument") 'New MSXML2.DOMDocument60
    Set htmlDoc = New HTMLDocument
    url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?%20db=pubmed&term=ABL1%20Acute%20granulocytic%20leukemia"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", url, False
        .send
        htmlDoc.body.innerHTML = .responseText
        xmlDoc.LoadXML .responseXML.XML
    End With
    Debug.Print xmlDoc.SelectSingleNode("//Count").Text
    Debug.Print htmlDoc.getElementsByTagName("Count")(0).innerText
End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • 1
    as for this line `Debug.Print htmlDoc.getElementsByTagName("Count")(0).innerText` doesn't return 248 as the previous line. I doubt that Count is a tag – YasserKhalil Jul 21 '19 at 17:21
  • 2
    @YasserKhalil hi, look at my answer. The response loaded into the Htmldocument is partial. The beginning is stripped and in fact no opening tag. If you print html.body.innerhtml you will see. This was my point about using xml parser in this instance. – QHarr Jul 21 '19 at 17:23
  • Thanks @QHarr for the detailed answer. While I did not achieve my goal you made it clear that the API returns something completely different than what I assumed so at least the error makes sense now. If I want to return the data I am looking for I will need to modify the API search to begin with. – David912 Jul 22 '19 at 07:24