I am trying to pull historical weather data from wunderground- originally I tried to scrape the web data off the main webpage using the standard HTML elements etc. etc.. however then I discovered that if you add "?format=1" to the URL a browser will display more or less a csv which is much easier to parse into data tables...
However I was still using IE browser which can be quite slow if you have a lot of data to pull so I started looking into using an XMLHTTP
object to help pull the data faster which is where I ran into my issue- as I mentioned adding "?format=1" to the end of the URL would return a nice slim csv, but when I use the XMLHTTP it pulls the HTML of the original page not the CSV!
It seems to me like when the XMLHTTP sends its request, the response comes back from the 'parent' site if you will and not the csv that I specified... I noticed that it pulls the HTML from the URL that ends in .html, like it cuts off at .html and that's why the response is from the .html page and not the CSV page.. so I tried looking into URL protocols to see if there was a way to force the XMLHTTP to request the csv URL but I am pretty far out of my element and couldn't find anything..
Code below- appreciate any help you can give me:
Public Sub downloadWebDataToCSV(URL As String)
Dim ADOStream As ADODB.Stream
Dim XMLHTTP As MSXML2.XMLHTTP60
Set XMLHTTP = New MSXML2.XMLHTTP60
XMLHTTP.Open "GET", URL, False
XMLHTTP.send
If XMLHTTP.Status = 200 Then
Set ADOStream = New ADODB.Stream
With ADOStream
.Open
.Type = adTypeBinary
.Write XMLHTTP.responseBody
.SaveToFile "C:\datafile.csv", 2
.Close
End With
End If
End Sub
Here are the parent and csv URLs as well if that helps-
Mainpage: https://www.wunderground.com/history/airport/KDAL/2003/10/15/DailyHistory.html CSV: https://www.wunderground.com/history/airport/KDAL/2003/10/15/DailyHistory.html?format=1
Thanks in advance, TheSilkCode