0

The code below (based on - Using VBA in Excel to Google Search in IE and return the hyperlink of the first result, by @Santosh) seems to print getelementsbytagname("a") for almost all URLs. Turns out it doesn't for some and an example URL is given in the code (weatherford.com) ...unless I comment out line Zz. Any reason why?

Note: To print links of carmax, run code as is. To print links of weatherford, comment out lines 1a and 2a, uncomment lines 1b and 2b. Run macro and it'll print a blank .txt document for weatherford links. Now delete weatherfordlinks.txt file on desktop, comment out line Zz and run macro...it'll now print weatherford links.

Sub testxmlhttp()

Dim xmlHttp As Object, myURL As String, html As Object, lnk As Object, links As Object
myURL = "http://www.carmax.com/"         '-->1a
'myURL = "http://www.weatherford.com"    '-->1b

Set xmlHttp = CreateObject("MSXML2.serverXMLHTTP")
xmlHttp.Open "GET", myURL, False
xmlHttp.setRequestHeader "Content-Type", "text/xml"    '-->Zz
xmlHttp.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
On Error Resume Next
xmlHttp.Send

Set html = CreateObject("htmlfile")
On Error Resume Next
html.body.innerHTML = xmlHttp.responseText

Open "C:\Users\~\desktop\carmaxLinks.txt" For Output As #1          '-->2a
'Open "C:\Users\~\desktop\weatherfordLinks.txt" For Output As #1    '-->2b
For Each lnk In html.getelementsbytagname("a")
Print #1, lnk
Next
Close #1

End Sub
Community
  • 1
  • 1
Sandy
  • 59
  • 2
  • 13

1 Answers1

0

This is not exactly an answer but more like a comment as I lack enough reputation to comment.

The issue can be analyzed using Fiddler which provides details of the requests and responses. The Content-Type header is required by the server for identifying the media type when files are uploaded as part of web requests. For simple requests this header is not required.

In case the Content-Type header is specified as tex/xml, the weatherford server expects a SOAP request with a proper XML request body. The response is below:

<?xml version='1.0' encoding='utf-8' ?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Body>
<SOAP-ENV:Fault>
<faultcode>Client</faultcode>
<faultstring>The SOAP request is invalid. The required node 'Envelope' is missing.</faultstring>
</SOAP-ENV:Fault>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

As there is no SOAP request body, it does not return any values.

The carmax server does not look for SOAP requests and hence does not vary the response based on the Content-Type header.

In both cases omitting the xmlHttp.setRequestHeader "Content-Type", "text/xml" '-->Zz part of the code should get the results.

Seby
  • 123
  • 9