2

I'm getting the same error in both quantmod and tinyquant for financials data. Can anyone see if this is reproducable? Is this a google finance server issue? None of the below functions have been working for me.I'm not sure if it's me or the server.

    tq_get("AAPL", get= "financials")
    [1] NA
    Warning message:
    x = 'AAPL', get = 'financials': Error in thead[x]:thead[x + 1]: NA/NaN 
    argument

and:

    getFin("AAPL")
    Error in thead[x]:thead[x + 1] : NA/NaN argument

Can somebody help?

Joe
  • 63
  • 1
  • 2
  • 5

3 Answers3

4

Try this:

library(jsonlite)
library(httr)

transpose_df <- function(df_list){
  df_list$maxAge <- NULL
  myColnames <- df_list$endDate$fmt
  df_list$endDate <- NULL
  
  mydf <- data.frame(row.names = colnames(df_list))
  
  for (i in 1:length(df_list)) {
    for (j in 1:4) {
      tryCatch(
        {
          mydf[i,j] <- df_list[j,i]$raw
        },
        error = function(cond){
          mydf[i,j] <- NA
        }
      )
      
    }
  }
  colnames(mydf) <- myColnames
  return(mydf)
}

scrapy_stocks <- function(stock){
  for (i in 1:length(stock)) {
    tryCatch(
      {
        url <- paste0('https://query1.finance.yahoo.com/v10/finance/quoteSummary/',stock[i],'?formatted=true&lang=en-US&region=US&modules=incomeStatementHistory%2CcashflowStatementHistory%2CbalanceSheetHistory&corsDomain=finance.yahoo.com')
        a <- GET(url)
        a <- content(a, as="text")
        
        df <- fromJSON(a, simplifyDataFrame = TRUE)
        
        df_is <- df$quoteSummary$result$incomeStatementHistory$incomeStatementHistory[[1]]
        df_is <- transpose_df(df_is)
        
        df_bs <- df$quoteSummary$result$balanceSheetHistory$balanceSheetStatements[[1]]
        df_bs <- transpose_df(df_bs)
        
        df_cs <- df$quoteSummary$result$cashflowStatementHistory$cashflowStatements[[1]]
        df_cs <- transpose_df(df_cs)
        assign(paste0(stock[i],'.f'),value = list(IS = df_is,BS = df_bs,CF = df_cs),envir = parent.frame())
      },
      error = function(cond){
        message(stock[i], "Give error ",cond)
      }
    )
  }
}

scrapy_stocks(c('PETR4.SA','VALE3.SA'))

You can call it as scrapy_stocks(c("AAPL","GOOGL")) and access its data as AAPL.f$IS,AAPL.f$BS or AAPL.f$CF.

It has been a while since I used R, so there is probably a better way to do this, specially transposing the Dataframe, but I think it is working. I hope It can help someone.

In the URL, if you use balanceSheetHistory, you get the Anual value, if you instead use the balanceSheetHistoryQuarterly you can also have the quarterly number. One can easily adapt the function for this.

Everton Reis
  • 422
  • 4
  • 16
  • Is this still working for you? I get `AAPL Give error Error in IS[, 1]: incorrect number of dimensions GOOGL Give error Error in IS[, 1]: incorrect number of dimensions` – user113156 Jul 07 '18 at 13:11
  • 1
    @user113156 the problem is in the yahoo finance site, if you check [GOOGL](https://finance.yahoo.com/quote/GOOGL/financials?p=GOOGL) you will se that all informations are empty, the same for [AAPL](https://finance.yahoo.com/quote/AAPL/financials?p=AAPL). So it will not work, something happened in the yahoo finance. – Everton Reis Jul 08 '18 at 16:00
  • Ah okay, I thought that you wrote a new script to scrap some of the financials from another source. – user113156 Jul 08 '18 at 19:37
  • 1
    @user113156 it is indeed. In the past, getFinancial used GOOGLE as source, so I wrote this script to get data from YAHOO, however, YAHOO had some problens yesterday, so both sources were not working. However, today YAHOO is working fine, so if you try the script, it will work normally. Hope you enjoy it. – Everton Reis Jul 09 '18 at 20:13
  • 1
    `scrapy_stocks("AAPL")` returns `AAPLGive error Error in p[[1]]: subscript out of bounds` – Alexandros Mar 06 '20 at 07:59
  • @Alexandros Yahoo changed the page structure. I will update the function when I have a chance. However, it is possible to get these data from Google using: getFinancials(Symbol, env = .GlobalEnv, src = "google",auto.assign = TRUE,...) – Everton Reis Mar 06 '20 at 14:13
  • Did you ever get a chance to update the script to account for the changed page structure? Thanks – Laurence_jj Nov 17 '21 at 11:50
  • @Laurence_jj function updated, give it a try and let me know if it worked for you. – Everton Reis Nov 17 '21 at 18:34
2

I tweaked the scrapy_stocks function to accommodate the Yahoo page update. I haven't thoroughly vetted this solution, but it seems to work well in all my trials thus far. Please be aware of two things:

  1. I don't think this would work if you have Yahoo Premium. I don't have it, so I can't test it. But if you do, it shouldn't be too difficult to update.
  2. I don't have a lot of experience with rvest, but because of the nature of the page, it had to set the function such that if there is one value that is missing, the entire row is missing.

Try this:

scrapy_stocks2 <- function(stock){
  if ("rvest" %in% installed.packages()) {
    library(rvest)
  }else{
    install.packages("rvest")
    library(rvest)
  }
  if ("xml2" %in% installed.packages()) {
    library(xml2)
  }else{
    install.packages("xml2")
    library(xml2)
  }
  for (stocknum in 1:length(stock)) {
    tryCatch(
      {
        # Income Statement
        url <- "https://finance.yahoo.com/quote/"
        url <- paste0(url,stock[stocknum],"/financials?p=",stock[stocknum])
        wahis.session <- html_session(url)  

        nodes <- wahis.session %>%
          html_nodes(xpath = '//*[@id="Col1-1-Financials-Proxy"]/section/div[4]//span')

        yh_data <- nodes %>% 
          xml_text() %>% 
          gsub(pattern = ',', replacement = '')
        colnums <- 1:6
        col_nms <- yh_data[colnums]
        yh_data <- yh_data[-colnums]

        lab_inds <- nodes %>% 
          html_attr(name = 'class') == "Va(m)"
        lab_inds[is.na(lab_inds)] <- FALSE

        lab_inds <- lab_inds[-colnums]
        data <- matrix(NA, nrow = sum(lab_inds), ncol = 5, dimnames = list(yh_data[lab_inds], col_nms[-1]))
        row_num <- 1
        for (i in 2:(length(lab_inds)-4)) {
          t_ind <- !lab_inds[i:(i+4)]
          if (sum(t_ind) == 5) {
            data[row_num, 1:5] <- as.numeric(yh_data[i:(i+4)])
          }
          if (lab_inds[i]) {
            row_num <- row_num+1
          }
        }

        temp1 <- as.data.frame(data)
        print(paste(stock[stocknum],'   Income Statement Success'))

        # Balance Sheet
        url <- "https://finance.yahoo.com/quote/"
        url <- paste0(url,stock[stocknum],"/balance-sheet?p=",stock[stocknum])
        wahis.session <- html_session(url)  

        nodes <- wahis.session %>%
          html_nodes(xpath = '//*[@id="Col1-1-Financials-Proxy"]/section/div[4]/div[1]/div[1]//span')

        yh_data <- nodes %>% 
          xml_text() %>% 
          gsub(pattern = ',', replacement = '')

        colnums <- 1:5
        col_nms <- yh_data[colnums]
        yh_data <- yh_data[-colnums]

        lab_inds <- nodes %>% 
          html_attr(name = 'class') == "Va(m)"

        lab_inds[is.na(lab_inds)] <- FALSE

        lab_inds <- lab_inds[-colnums]
        data <- matrix(NA, nrow = sum(lab_inds), ncol = 4, dimnames = list(yh_data[lab_inds], col_nms[-1]))
        row_num <- 1
        for (i in 2:(length(lab_inds)-3)) {
          t_ind <- !lab_inds[i:(i+3)]
          if (sum(t_ind) == 4) {
            data[row_num, 1:4] <- as.numeric(yh_data[i:(i+3)])
          }
          if (lab_inds[i]) {
            row_num <- row_num+1
          }
        }

        temp2 <- as.data.frame(data)

        print(paste(stock[stocknum],'   Balance Sheet Success'))

        # Cash Flow
        url <- "https://finance.yahoo.com/quote/"
        url <- paste0(url,stock[stocknum],"/cash-flow?p=",stock[stocknum])
        wahis.session <- html_session(url)
        nodes <- wahis.session %>%
          html_nodes(xpath = '//*[@id="Col1-1-Financials-Proxy"]/section/div[4]/div[1]/div[1]//span')

        yh_data <- nodes %>% 
          xml_text() %>% 
          gsub(pattern = ',', replacement = '')
        colnums <- 1:6
        col_nms <- yh_data[colnums]
        yh_data <- yh_data[-colnums]

        lab_inds <- nodes %>% 
          html_attr(name = 'class') == "Va(m)"
        lab_inds[is.na(lab_inds)] <- FALSE

        lab_inds <- lab_inds[-colnums]
        data <- matrix(NA, nrow = sum(lab_inds), ncol = 5, dimnames = list(yh_data[lab_inds], col_nms[-1]))
        row_num <- 1
        for (i in 2:(length(lab_inds)-4)) {
          t_ind <- !lab_inds[i:(i+4)]
          if (sum(t_ind) == 5) {
            data[row_num, 1:5] <- as.numeric(yh_data[i:(i+4)])
          }
          if (lab_inds[i]) {
            row_num <- row_num+1
          }
        }

        temp3 <- as.data.frame(data)

        print(paste(stock[stocknum],'   Cash Flow Statement Success'))

        assign(paste0(stock[stocknum],'.f'),value = list(IS = temp1,BS = temp2,CF = temp3),envir = parent.frame())

      },
      error = function(cond){
        message(stock[stocknum], "Give error ",cond)
      }
    )
  }
}


JBquant
  • 21
  • 1
-1

Yes I get the same issue for the past couple of days as well. I think it may have to do with a change on the part of Google Finance. The site is now different and url as well.

Joe
  • 14