0

I am working on a code to get data from : https://www.investing.com/economic-calendar/core-durable-goods-orders-59

I have got the code for getting this via httprequest: but looking to change this to work for the economic data (link above) is there any way I can get the same for the economic indicators??

code below:

Option Explicit
Sub Export_Table()

'Html Objects---------------------------------------'
 Dim htmlDoc As MSHTML.HTMLDocument
 Dim htmlBody As MSHTML.htmlBody
 Dim ieTable As MSHTML.HTMLTable
 Dim Element As MSHTML.HTMLElementCollection


'Workbooks, Worksheets, Ranges, LastRow, Incrementers ----------------'
 Dim wb As Workbook
 Dim Table As Worksheet
 Dim i As Long

 Set wb = ThisWorkbook
 Set Table = wb.Worksheets("Sheet1")

 '-------------------------------------------'
 Dim xmlHttpRequest As New MSXML2.XMLHTTP60  '
 '-------------------------------------------'


 i = 2

'Web Request --------------------------------------------------------------------------'
 With xmlHttpRequest
 .Open "POST", "https://www.investing.com/instruments/HistoricalDataAjax", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.send "curr_id=951681&smlID=1695217&header=CLNX+Historical+Data&st_date=01%2F01%2F2017&end_date=03%2F01%2F2019&interval_sec=Monthly&sort_col=date&sort_ord=DESC&action=historical_data"




 If .Status = 200 Then

        Set htmlDoc = CreateHTMLDoc
        Set htmlBody = htmlDoc.body

        htmlBody.innerHTML = xmlHttpRequest.responseText

        Set ieTable = htmlDoc.getElementById("curr_table")

        For Each Element In ieTable.getElementsByTagName("tr")
            Table.Cells(i, 1) = Element.Children(0).innerText
            Table.Cells(i, 2) = Element.Children(1).innerText
            Table.Cells(i, 3) = Element.Children(2).innerText
            Table.Cells(i, 4) = Element.Children(3).innerText
            Table.Cells(i, 5) = Element.Children(4).innerText
            Table.Cells(i, 6) = Element.Children(5).innerText
            Table.Cells(i, 7) = Element.Children(6).innerText

            i = i + 1
        DoEvents: Next Element
 End If
End With


Set xmlHttpRequest = Nothing
Set htmlDoc = Nothing
Set htmlBody = Nothing
Set ieTable = Nothing
Set Element = Nothing

End Sub

Public Function CreateHTMLDoc() As MSHTML.HTMLDocument
    Set CreateHTMLDoc = CreateObject("htmlfile")
End Function
braX
  • 11,506
  • 5
  • 20
  • 33

2 Answers2

3

I have used the excel tool Power Query for this very thing. It is also called Get & Transform Data. I don't like using vba for doing this type of thing.

To make it work:

  1. In Excel Go to Data>Get Data>From Other Sources>From Web. enter image description here

  2. Enter the URL enter image description here

  3. Wait for the webpage to load and then pick the table that you want. enter image description here This website took awhile to load, but it did work for me.

  4. Choose "Load" which goes directly to the sheet, or "Transform Data" to manipulate the data in Power Query. There are many options in power query such as split columns, filter data, Calculate Columns and ...

Shane S
  • 1,747
  • 14
  • 31
  • 1
    Wasn't aware of this. Pretty cool. and by "took a while" you mean about 3-4 minutes to get the list of tables (for me at least) – braX Nov 24 '21 at 02:42
  • yeah I used power query but this does not feed entire data set, it is only 6 rows, the rest are hidden is there any way in power query to get entire data set ? – Karol Sokol Nov 24 '21 at 11:48
  • I would prefer to get this via xmlhttprequest. appreciate your comments here – Karol Sokol Nov 24 '21 at 12:09
0

I would avoid the overhead of setting up a permanent connection and simply continue using XHR. With the data > from web, you cannot grab more rows than are present on the initial landing. If however you go with XHR, you can issue POST requests to get more data. The code below utilizes a loop to retrieve additional results beyond the immediate visible on the page.

When you press the Show more link there is a POST request for an additional 6 rows which uses the latest date from the current set of results as part of the POST body. The response returned is JSON. Rather than bringing in a JSON parser, given the standard nature of the JSON, and that I am already using regex to clean the date format in column 1 to put in the POST body, I use two simple regexes to extract the html for the next results table from the response, and to check whether there are more results.

The format of the JSON is:

{
  "historyRows": "<tr>…..</tr>",
  "hasMoreHistory": "1"
}

Or

{
  "historyRows": "<tr>…..</tr>",
  "hasMoreHistory": false
}

So, I do some cleaning of the extracted html in order to not confuse the html parser within MSHTML. Furthermore, I add in an id to identify the table I have constructed, so I can continue to use an id css selector (#) list within my UpdateDateResults function.

I initially oversize an array to store each retrieved table which I update ByRef. I loop requesting more results until either there are no more results, there is an error parsing the maximum date from the last retrieved table column 1, or until my specified earliest date for data retrieval falls within the date range of the latest returned table.

Finally, I write the results array out to the sheet in one go.

N.B. You can target the table by its id. It looks like the number at the end of the id could be the same as for the goods url, lending itself to generalizing the code below to work for other goods.


VBA:

Option Explicit

Public Sub GetInvestingInfo()
    'tools > references > Microsoft HTML Object Library
    Dim html As MSHTML.HTMLDocument, xhr As Object
        
    Const STARTDATE As Date = "2019-11-25"       'Adjust as required. DateAdd("yyyy", -2, Date) 2 years back. This means may have some earlier months in _
                                                 batch that spans the start date but won't issue an additional request after this
        
    Set xhr = CreateObject("MSXML2.XMLHTTP")
    Set html = New MSHTML.HTMLDocument
     
    With xhr
        .Open "GET", "https://www.investing.com/economic-calendar/core-durable-goods-orders-59", False
        .setRequestHeader "User-Agent", "Safari/537.36"
        .send
        html.body.innerHTML = .responseText
    End With
        
    Dim firstTable As Boolean, r As Long, results() As Variant
        
    ReDim results(1 To 100000, 1 To 5)
        
    'process initial table and update results, get cleaned date needed for request for more results
    firstTable = True
            
    Dim latestDate As String
    
    UpdateDateResults latestDate, results, firstTable, r, html
        
    Dim re As Object, maxDate As String, hasMoreHistory As Boolean, s As String
    
    Set re = CreateObject("VBScript.RegExp")
    
    With re
        .Global = True
        .MultiLine = False
    End With
        
    maxDate = cleanedDate(latestDate, re)
    hasMoreHistory = True
    
    Dim errorDate As Date
    
    errorDate = DateAdd("d", 1, Date)
    
    Do While maxDate >= STARTDATE And maxDate < errorDate 'break loop using pre-defined earliest date, error with date conversion, or when no more rows found
            
        Application.Wait (Now + TimeSerial(0, 0, 1)) 'Pause
            
        s = GetMoreRows(xhr, Format$(maxDate, "YYYY-MM-DD")) 'max a POST request for more data
            
        re.Pattern = "hasMoreHistory"":(""?.*?""?)}"   'Check if there are more rows still available. "1" for yes, false for no
        hasMoreHistory = (re.Execute(s)(0).submatches(0) <> False)
            
        If Not hasMoreHistory Then Exit Do

        re.Pattern = "historyRows"":""(.*)"","
        html.body.innerHTML = "<table id=""me"">" & Replace$(re.Execute(s)(0).submatches(0), "\/", "/") & "</table>" 'fix html and feed into html variable
            
        UpdateDateResults latestDate, results, firstTable, r, html
        maxDate = cleanedDate(latestDate, re)    'convert value retrieved from last row in date column of table to an actual date
      
    Loop
        
    With ActiveSheet
        .Cells.ClearContents
        .Cells(1, 1).Resize(r, 5) = results      'Don't bother to resize results as clear all cells before write ou
    End With
    
End Sub

Public Sub UpdateDateResults(ByRef latestDate As String, ByRef results() As Variant, ByRef firstTable As Boolean, ByRef r As Long, ByVal html As MSHTML.HTMLDocument)
        
    Dim table As MSHTML.HTMLTable                'return latest date from function
        
    Set table = html.querySelector("#eventHistoryTable59, #me")
    latestDate = table.Rows(table.Rows.Length - 1).Children(0).innerText
    
    Dim i As Long, n As Long, j As Long
        
    n = IIf(firstTable, 0, 1)
        
    For i = n To table.Rows.Length - 1
        r = r + 1
        For j = 0 To table.Rows(i).Children.Length - 2
            results(r, j + 1) = table.Rows(i).Children(j).innerText
        Next
    Next
        
    firstTable = False
End Sub

Public Function cleanedDate(ByVal dirtyString As String, ByVal re As Object) As Date
        
    re.Pattern = "(^[A-Z][a-z]{2}).*(\d{2}),.(\d{4})(.*)"
        
    On Error GoTo errhand:
         
    If re.test(dirtyString) Then
        cleanedDate = CDate(re.Replace(dirtyString, "$2" & Chr$(32) & "$1" & Chr$(32) & "$3"))
        Exit Function
    End If
            
errhand:
    
    cleanedDate = DateAdd("d", 1, Date)
            
End Function

Public Function GetMoreRows(ByVal xhr As Object, ByVal dateStamp As String) As String
    With xhr
        .Open "POST", "https://www.investing.com/economic-calendar/more-history", False
        .setRequestHeader "User-Agent", "Safari/537.36"
        .setRequestHeader "x-requested-with", "XMLHttpRequest"
        .setRequestHeader "content-type", "application/x-www-form-urlencoded"
        .send "eventID=430865&event_attr_ID=59&event_timestamp=" & dateStamp & "+" & Application.WorksheetFunction.EncodeURL("12:30:00") & "&is_speech=0"
        GetMoreRows = .responseText
    End With
End Function

Regexes (without the double " escaping for VBA):

hasMoreHistory":("?.*?"?)}

enter image description here

historyRows":"(.*)",

enter image description here

QHarr
  • 83,427
  • 12
  • 54
  • 101