1

Good afternoon guys. In a follow up to a previous query which was very much solved by QHarr, I was wanting to run the solved query against multiple fields from the source code rather than just one.

The URL I am using is: https://finance.yahoo.com/quote/AAPL/?p=AAPL

and the VBA code which takes the 'Previous Close' price is:

Option Explicit

    Sub PreviousClose()
        Dim html As HTMLDocument, http As Object, ticker As Range
        Set html = New HTMLDocument
        Set http = CreateObject("WINHTTP.WinHTTPRequest.5.1")

    Dim lastRow As Long, myrng As Range
    With ThisWorkbook.Worksheets("Tickers")

        lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
        Set myrng = .Range("A2:A" & lastRow)

        For Each ticker In myrng
            If Not IsEmpty(ticker) Then
                With http
                    .Open "GET", "https://finance.yahoo.com/quote/" & ticker.Value & "?p=" & ticker.Value, False
                    .send
                    html.body.innerHTML = .responseText
                End With
                On Error Resume Next
                ticker.Offset(, 1) = html.querySelector("[data-test=PREV_CLOSE-value]").innertext

                On Error GoTo 0
            End If
        Next

    End With
End Sub

Anyway, each field would ideally be in a row right of the ticker used for the stock.

Screenshot of Sheet:

image

Any help would be very much appreciated.
Thanks.

QHarr
  • 83,427
  • 12
  • 54
  • 101
KSASteve
  • 57
  • 6
  • Do you want the entire table from Previous Close to 1y Target Est? – QHarr Sep 08 '18 at 10:44
  • All the fields available I do use, so if they were all available in one table that would be perfect. Thank you for your help and assistance with this. – KSASteve Sep 08 '18 at 10:49

2 Answers2

4

tl;dr;

The code below works for the given test cases. With much longer lists please see the ToDo section.

API:

You want to look into an API to provide this info if possible. I believe Alpha Vantage now provide info the Yahoo Finance API used to* . There is a nice JS tutorial here. Alpha Vantage documentation here. At the very bottom of this answer, I have a quick look at the time series functions available via the API.

WEBSERVICE function:

With an API key, you can also potentially use the webservice function in Excel to retrieve and parse data. Example here. Not tested.

XMLHTTPRequest and class:

However, I will show you a way using a class and a loop over URLs. You can improve on this. I use a bare bones class called clsHTTP to hold the XMLHTTP request object. I give it 2 methods. One, GetHTMLDoc, to return the request response in an html document, and the other, GetInfo, to return an array of the items of interest from the page.

Using a class in this way means we save on the overhead of repeatedly creating and destroying the xmlhttp object and provides a nice descriptive set of exposed methods to handle the required tasks.

It is assumed your data is as shown, with header row being row 2.

ToDo:

The immediately obvious development, IMO, is you will want to add some error handling in. For example, you might want to develop the class to handle server errors.


VBA:

So, in your project you add a class module called clsHTTP and put the following:

clsHTTP

Option Explicit

Private http As Object
Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetHTMLDoc(ByVal URL As String) As HTMLDocument
    Dim html As HTMLDocument
    Set html = New HTMLDocument
    With http
        .Open "GET", URL, False
        .send
        html.body.innerHTML = StrConv(.responseBody, vbUnicode)
        Set GetHTMLDoc = html
    End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument, ByVal endPoint As Long) As Variant
    Dim nodeList As Object, i As Long, result(), counter As Long
    Set nodeList = html.querySelectorAll("tbody td")
    ReDim result(0 To endPoint - 1)
    For i = 1 To 2 * endPoint Step 2
        result(counter) = nodeList.item(i).innerText
        counter = counter + 1
    Next    
    GetInfo = result
End Function

In a standard module (module 1)

Option Explicit
Public Sub GetYahooInfo()
    Dim tickers(), ticker As Long, lastRow As Long, headers()
    Dim wsSource As Worksheet, http As clsHTTP, html As HTMLDocument

    Application.ScreenUpdating = False

    Set wsSource = ThisWorkbook.Worksheets("Sheet1") '<== Change as appropriate to sheet containing the tickers
    Set http = New clsHTTP

    headers = Array("Ticker", "Previous Close", "Open", "Bid", "Ask", "Day's Range", "52 Week Range", "Volume", "Avg. Volume", "Market Cap", "Beta", "PE Ratio (TTM)", "EPS (TTM)", _
                    "Earnings Date", "Forward Dividend & Yield", "Ex-Dividend Date", "1y Target Est")

    With wsSource
        lastRow = GetLastRow(wsSource, 1)
        Select Case lastRow
        Case Is < 3
            Exit Sub
        Case 3
            ReDim tickers(1, 1): tickers(1, 1) = .Range("A3").Value
        Case Is > 3
            tickers = .Range("A3:A" & lastRow).Value
        End Select

        ReDim results(0 To UBound(tickers, 1) - 1)
        Dim i As Long, endPoint As Long
        endPoint = UBound(headers)

        For ticker = LBound(tickers, 1) To UBound(tickers, 1)
            If Not IsEmpty(tickers(ticker, 1)) Then
                Set html = http.GetHTMLDoc("https://finance.yahoo.com/quote/" & tickers(ticker, 1) & "/?p=" & tickers(ticker, 1))
                results(ticker - 1) = http.GetInfo(html, endPoint)
                Set html = Nothing
            Else
                results(ticker) = vbNullString
            End If
        Next

        .Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
        For i = LBound(results) To UBound(results)
            .Cells(3 + i, 2).Resize(1, endPoint-1) = results(i)
        Next
    End With   
    Application.ScreenUpdating = True
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function

Results:

results


Notes on GetInfo method and CSS selectors:

The class method of GetInfo extracts the info from each webpage using a css combination selector to target the page styling.

The info we are after on each page is house in two adjacent tables, for example:

Rather than mess around with multiple tables I simply target all the table cells, within table body elements, with a selector combination of tbody td.

The CSS selector combination is applied via the querySelectorAll method of HTMLDocument, returning a static nodeList.

The returned nodeList items have headers at even indices and the required data at odd indices. I only want the first two tables of info so I terminate the loop over the returned nodeList when I gave gone twice the length of the headers of interest. I use a step 2 loop from index 1 to retrieve only the data of interest, minus the headers.

A sample of what the returned nodeList looks like:


References (VBE > Tools > References):

  1. Microsoft HTML Object Library

Alpha Vantage API:

A quick look at the time series API call shows that a string can be used

https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AA&outputsize=full&apikey=yourAPIKey

This yields a JSON response that in the Time Series (Daily) sub dictionary of the overall returned dictionary, has 199 dates returned. Each date has the following info:

A little digging through the documentation will unveil whether bundling of tickers is possible, I couldn't see this quickly, and whether more of your initial items of interest are available via a different query string.

There is more info, for example, using the TIME_SERIES_DAILY_ADJUSTED function in the URL call

https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=AA&outputsize=full&apikey=yourAPIkey

Here, you then get the following:

You can parse the JSON response using a JSON parser such as JSONConverter.bas and there are also options for csv download.

* Worth doing some research on which APIs provide the most coverage of your items. Alpha Vantage doesn't appear to cover as many as my code above retrieves.

QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Thanks - I'm going to check out that Alpha Vantage link and tutorial. That worked as you illustrated in the table shot. However, when I run it on the full list of tickers I am consistently getting the error 'Object variable or With block variable not set'. On this line of code: Results(ticker - 1) = http.GetInfo(html, endPoint) – KSASteve Sep 08 '18 at 12:31
  • Please supply the failing ticker..... this is probably what I mentioned about adding error handling in and is something you will need to learn how to do to handle the different cases that crop up. – QHarr Sep 08 '18 at 12:51
  • Hi there. Sorry for the delay in getting back to you. I've not been able to determine what the issue is? Sometimes it works and other times it doesn't. I may need to resort to your earlier response and gather the data for each separately - it would have been easier to get them all together, but I guess that's just me being impatient. I'll also take the steps to learn a lot more about the subject - thanks for the links and advice. Appreciate your time. – KSASteve Sep 08 '18 at 18:51
  • Perhaps you can share more of the links. It may be that you experience a timeout issue and a loop to request is needed to re-issue the request. Does it occur after a certain number? This is unlikely to be about getting more items off a given page, but about how many pages you are trying to scrape, internet connectivity etc.... the code above is pretty effective but with more links to test it can be further developed to handle issues that come up. – QHarr Sep 08 '18 at 18:55
  • Also, providing detail on the error messages would help in adding some error handling. I cannot currently reproduce any errors as the above works fine for me and I don’t know what exact problems with the longer list you are having. – QHarr Sep 08 '18 at 19:03
  • Ah, I see. I have ran your code on the few tickers in the sample sheet and it does have repeatability, so the work you did on the example I sent was 100% correct, thanks. The error occurs when I increase the list to ~2,500 tickers. I have taken the UK stocks out and am just trying he US stocks only and at present it appears to be working away. In regards to internet I am on 100Mb/s fiber and using a Dell XPS 13 i7 8th Core - so I am assuming I am ok in the tech department? If I experience further errors I will try and get them on here for you to check out unless I can direct message you? – KSASteve Sep 08 '18 at 19:14
  • It is a large number of tickers so I would expect you to need to add in some error handling as per the ToDo statement. Some example error messages would help to add a bit more to the above. The danger here is when it starts to become building you a custom application beyond just solving the stated problem. – QHarr Sep 08 '18 at 19:17
  • The rationalised list ~1,700 failed, but I've just ran a list of 100 with no problems. I'll persevere with this in to the night and see where I get tomorrow. Thanks for your ongoing support. – KSASteve Sep 08 '18 at 19:36
  • You could also try rubbing in batches. 100 at a time for example. – QHarr Sep 08 '18 at 19:37
  • Yes I'd thought about that - can I just copy/paste the code and change the sheet ref? – KSASteve Sep 08 '18 at 19:41
  • It is one possibility to try. I would still observe the error messages you are getting and put some handling in for that. I suggest you do some reading up on error handling and also http status codes. – QHarr Sep 08 '18 at 19:42
  • I am currently working through the tickers in batches of 20 to see if an error occurs, which it does - then I find which ticker is no longer valid. This then runs the code in further batches of 200 no problem. I have copied the code and called each sheet to take 200 tickers at a time. It's a long way round but this is the way my knowledge is driving me. Again, thanks very much for your help. – KSASteve Sep 10 '18 at 05:42
  • Hi, thanks again for your input it was greatly appreciated. I have managed to add an 'On Error Resume Next' to the code which skips a Ticker which may not return a valid page - and that works a treat. However, I am looking to expand the tables and pages I can reference, 'Key Statistics' being the next. I have changed the code where I think it may need changing to reference the new page and fields. I have updated the clsHTTP too in the new sheet, but I am struggling. I have posted a new question regards the query, and hoping if you have the time and knowledge you could check it out. Thank you. – KSASteve Sep 19 '18 at 10:43
  • @QHarr Thank you for this awesome explanation. As for the API part it is documented `https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=MSFT&outputsize=full&apikey=demo` > covering 20+ years .. But I found the last date is 27 April 2014. Is there a way to get older dates ?? (e.g 2012) – YasserKhalil Jul 19 '19 at 04:20
  • @YasserKhalil Hi, Please consider opening a new question and include your current script minus API key and explain what you have tried and expected result versus actual. – QHarr Jul 19 '19 at 04:47
  • There is no problem in the code. The problem is the returned JSON only till 2014! – YasserKhalil Jul 19 '19 at 05:21
  • where is the documentation link for the above? – QHarr Jul 19 '19 at 05:47
0

That's some slick code!! I like it a lot!! As an aside, you may want to consider using R to do this kind of thing. Look at what you can do with just a few simple lines of code!

library(finreportr)

# print available functions in finreportr
ls('package:finreportr')

my.ticker <- 'SBUX'

# set final year
my.year <- 2017

# get income for FB
my.income <- GetIncome(my.ticker, my.year)

# print result
print(head(my.income))


# get unique fields
unique.fields <- unique(my.income$Metric)

# cut size of string
unique.fields <- substr(unique.fields,1, 60)

# print result
print(unique.fields)


# set col and date
my.col <- 'Earnings Per Share, Basic'

# print earnings per share
print(my.income[my.income$Metric == my.col, ])


library(tidyquant)

# set stock and dates
my.ticker <- 'AAPL'
first.date <- '2017-01-01'
last.date <-  Sys.Date()

# get data with tq_get
my.df <- tq_get(my.ticker,
                get = "stock.prices", 
                from = first.date, 
                to = last.date)

print(tail(my.df))


# get key financial rations of AAPL
df.key.ratios <- tq_get("AAPL",get = "key.ratios")

# print it
print(df.key.ratios) 
ASH
  • 20,759
  • 19
  • 87
  • 200