18

I have a question about the function of Yahoo Finance using the pandas data reader. I'm using for months now a list with stock tickers and execute it in the following lines:

import pandas_datareader as pdr
import datetime

stocks = ["stock1","stock2",....]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)

f = pdr.DataReader(stocks, 'yahoo',start,end)

Since yesterday I get the error "IndexError: list index out of range", which appears only if I try to get multiple stocks.

Has anything changed in recent days, which I have to consider, or do you have a better solution for my problem?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
ScharcoMolten
  • 183
  • 1
  • 1
  • 5

5 Answers5

16

Updated as of 2021-01-19

tickers = ['msft', 'aapl', 'intc', 'tsm', 'goog', 'amzn', 'fb', 'nvda']
df = pdr.DataReader(tickers, data_source='yahoo', start='2017-01-01', end='2020-09-28')

Original Answer

If you read through Pandas DataReader's documentation, they issued an immediate depreciation on multiple data source API's, one of which is Yahoo! Finance.

v0.6.0 (January 24, 2018)

Immediate deprecation of Yahoo!, Google Options and Quotes and EDGAR. The end points behind these APIs have radically changed and the existing readers require complete rewrites. In the case of most Yahoo! data the endpoints have been removed. PDR would like to restore these features, and pull requests are welcome.

This could be the culprit to why you been getting IndexError's (or any other normally none-existant errors).


However, there is another Python package whose goal is to fix the support for Yahoo! Finance for Pandas DataReader, you can find that package here:

https://pypi.python.org/pypi/fix-yahoo-finance

According to their documentation:

Yahoo! finance has decommissioned their historical data API, causing many programs that relied on it to stop working.

fix-yahoo-finance offers a temporary fix to the problem by scraping the data from Yahoo! finance using and return a Pandas DataFrame/Panel in the same format as pandas_datareader’s get_data_yahoo().

By basically “hijacking” pandas_datareader.data.get_data_yahoo() method, fix-yahoo-finance’s implantation is easy and only requires to import fix_yahoo_finance into your code.

All you need to add is this:

from pandas_datareader import data as pdr
import fix_yahoo_finance as yf

yf.pdr_override() 

stocks = ["stock1","stock2", ...]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)

f = pdr.get_data_yahoo(stocks, start=start, end=end)

Or without Pandas DataReader:

import fix_yahoo_finance as yf

stocks = ["stock1","stock2", ...]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
data = yf.download(stocks, start=start, end=end)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Taku
  • 31,927
  • 11
  • 74
  • 85
  • This now results in an error addressed by ["TypeError: string indices must be integers" when getting data of a stock from Yahoo Finance using Pandas Datareader](https://stackoverflow.com/q/74832296/7758804) – Trenton McKinney Apr 01 '23 at 16:57
7

You can use the new Python YahooFinancials module with pandas to do this. YahooFinancials is well built and gets it's data by hashing out the datastore object present in each Yahoo Finance Web page, so it's fast and doesn't rely on the old discontinued api nor a web driver like a scraper does. Data is returned as JSON and you can pull as many stocks as you want at once by passing in a list of stock/index tickers to initialize the YahooFinancials Class with.

$ pip install yahoofinancials

Usage Example:

from yahoofinancials import YahooFinancials
import pandas as pd

# Select Tickers and stock history dates
ticker = 'AAPL'
ticker2 = 'MSFT'
ticker3 = 'INTC'
index = '^NDX'
freq = 'daily'
start_date = '2012-10-01'
end_date = '2017-10-01'


# Function to clean data extracts
def clean_stock_data(stock_data_list):
    new_list = []
    for rec in stock_data_list:
        if 'type' not in rec.keys():
            new_list.append(rec)
    return new_list

# Construct yahoo financials objects for data extraction
aapl_financials = YahooFinancials(ticker)
mfst_financials = YahooFinancials(ticker2)
intl_financials = YahooFinancials(ticker3)
index_financials = YahooFinancials(index)

# Clean returned stock history data and remove dividend events from price history
daily_aapl_data = clean_stock_data(aapl_financials
                                     .get_historical_stock_data(start_date, end_date, freq)[ticker]['prices'])
daily_msft_data = clean_stock_data(mfst_financials
                                     .get_historical_stock_data(start_date, end_date, freq)[ticker2]['prices'])
daily_intl_data = clean_stock_data(intl_financials
                                     .get_historical_stock_data(start_date, end_date, freq)[ticker3]['prices'])
daily_index_data = index_financials.get_historical_stock_data(start_date, end_date, freq)[index]['prices']
stock_hist_data_list = [{'NDX': daily_index_data}, {'AAPL': daily_aapl_data}, {'MSFT': daily_msft_data},
                        {'INTL': daily_intl_data}]


# Function to construct data frame based on a stock and it's market index
def build_data_frame(data_list1, data_list2, data_list3, data_list4):
    data_dict = {}
    i = 0
    for list_item in data_list2:
        if 'type' not in list_item.keys():
            data_dict.update({list_item['formatted_date']: {'NDX': data_list1[i]['close'], 'AAPL': list_item['close'],
                                                            'MSFT': data_list3[i]['close'],
                                                            'INTL': data_list4[i]['close']}})
            i += 1
    tseries = pd.to_datetime(list(data_dict.keys()))
    df = pd.DataFrame(data=list(data_dict.values()), index=tseries,
                      columns=['NDX', 'AAPL', 'MSFT', 'INTL']).sort_index()
    return df

Multiple stocks data at once example (returns list of JSON objects for each ticker):

from yahoofinancials import YahooFinancials

tech_stocks = ['AAPL', 'MSFT', 'INTC']
bank_stocks = ['WFC', 'BAC', 'C']

yahoo_financials_tech = YahooFinancials(tech_stocks)
yahoo_financials_banks = YahooFinancials(bank_stocks)

tech_cash_flow_data_an = yahoo_financials_tech.get_financial_stmts('annual', 'cash')
bank_cash_flow_data_an = yahoo_financials_banks.get_financial_stmts('annual', 'cash')

banks_net_ebit = yahoo_financials_banks.get_ebit()
tech_stock_price_data = tech_cash_flow_data.get_stock_price_data()
daily_bank_stock_prices = yahoo_financials_banks.get_historical_stock_data('2008-09-15', '2017-09-15', 'daily')

JSON Output Example:

Code:

yahoo_financials = YahooFinancials('WFC')
print(yahoo_financials.get_historical_stock_data("2017-09-10", "2017-10-10", "monthly"))

JSON Return:

{
    "WFC": {
        "prices": [
            {
                "volume": 260271600,
                "formatted_date": "2017-09-30",
                "high": 55.77000045776367,
                "adjclose": 54.91999816894531,
                "low": 52.84000015258789,
                "date": 1506830400,
                "close": 54.91999816894531,
                "open": 55.15999984741211
            }
        ],
        "eventsData": [],
        "firstTradeDate": {
            "date": 76233600,
            "formatted_date": "1972-06-01"
        },
        "isPending": false,
        "timeZone": {
            "gmtOffset": -14400
        },
        "id": "1mo15050196001507611600"
    }
}
alt777
  • 171
  • 1
  • 4
  • Just wondering: what is the problem with using a "web driver" ? (I am not even sure what is the definition of a "web driver"). Because your module is scraping web pages so it needs a way to download these pages. – Gabriel Devillers Jul 10 '18 at 22:36
  • 1
    Also you should add that **if you start to use this module, you either hope it will be maintained, or are ready to maintain it if needs be**. Indeed Yahoo Financial could change the structure of their web pages any day, which could require a modification of the module scraping code. – Gabriel Devillers Jul 10 '18 at 22:38
  • Agreed. Tbh I'm ready to maintain the code atleast for my personal use if need be. It's really well put together. Maintainer has addressed a few issues and closed them so he doesn't appear absent. Yahoo Finance seems to have had a massive new web app deployment when they shutdown their API. On top of that the fields appears to be directly from the database. I don't seem them changing the field names anytime soon and frontend changes won't effect the module unless they change the web app framework they are using. DBs are usually only changed in major releases. – alt777 Jul 10 '18 at 23:47
  • Also web drivers are nice don't get me wrong. I love selenium + phantom. However the rendered GUI components you scrap are more likely to change than the data store variables generally and a name change solution for a renamed database field (very rare) is usually easier to implement then writing new code to interact with and scrap a new part of the app. Also this solution is faster than getting the data from a web scraper from my experience. Especially when it's data you can only scrap by pagination buttons via the web driver. I suspect this module will be fine with only minor fixes for ~3 yrs – alt777 Jul 10 '18 at 23:52
0

yahoo_finance no longer works, since Yahoo has changed the format, fix_yahoo_finance is good enough to download data. However, to parse, you'll need other libraries.

import numpy as np #python library for scientific computing
import pandas as pd #python library for data manipulation and analysis
import matplotlib.pyplot as plt #python library for charting
import fix_yahoo_finance as yf #python library to scrape data from yahoo finance
from pandas_datareader import data as pdr #extract data from internet sources into pandas data frame

yf.pdr_override()

data = pdr.get_data_yahoo(‘^DJI’, start=”2006–01–01")
data2 = pdr.get_data_yahoo(“MSFT”, start=”2006–01–01")
data3 = pdr.get_data_yahoo(“AAPL”, start=”2006–01–01")
data4 = pdr.get_data_yahoo(“BB.TO”, start=”2006–01–01")

ax = (data[‘Close’] / data[‘Close’].iloc[0] * 100).plot(figsize=(15, 6))
(data2[‘Close’] / data2[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
(data3[‘Close’] / data3[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
(data4[‘Close’] / data5[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))

plt.legend([‘Dow Jones’, ‘Microsoft’, ‘Apple’, ‘Blackberry’], loc=’upper left’)
plt.show()

Visit Charting stocks price from Yahoo Finance using fix-yahoo-finance library for the code explanation.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Gerry
  • 101
  • 1
  • 7
0
watchlist=["stock1","stock2".......]
closing_price=pd.DataFrame()
symbols=[]

for i in watchlist:
    Result=wb.DataReader(i,start='05-1-20', end='05-20-20',data_source='yahoo')
    closing_price=closing_price.append(Result)        
    symbols.append(i)
    print("Generating Closing price for",i)  
  
closing_price["SYMBOL"]=symbols
print("closing_price"
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
0
from yahoofinancials import YahooFinancials

assets = ['TSLA', 'MSFT', 'FB']

yahoo_financials = YahooFinancials(assets)

data = yahoo_financials.get_historical_price_data(start_date='2019-01-01', 
                                                  end_date='2019-12-31', 
                                                  time_interval='weekly')

prices_df = pd.DataFrame({
    a: {x['formatted_date']: x['adjclose'] for x in data[a]['prices']} for a in assets})

prices_df

Result:

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
ASH
  • 20,759
  • 19
  • 87
  • 200