49

Is there a way to automatically download historical prices of stocks from yahoo finance or google finance (csv format)? Preferably in Python.

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
Bob
  • 10,741
  • 27
  • 89
  • 143
  • check out http://scrape-google-finance.compunect.com/ it's a rather new open source PHP scraper for Google finance. It's free to use/modify and you can download all stock prices and all companies from Google. Should not be too difficult to learn from it and write the same in python. – John Jun 22 '14 at 19:14

6 Answers6

108

When you're going to work with such time series in Python, pandas is indispensable. And here's the good news: it comes with a historical data downloader for Yahoo: pandas.io.data.DataReader.

from pandas.io.data import DataReader
from datetime import datetime

ibm = DataReader('IBM',  'yahoo', datetime(2000, 1, 1), datetime(2012, 1, 1))
print(ibm['Adj Close'])

Here's an example from the pandas documentation.

Update for pandas >= 0.19:

The pandas.io.data module has been removed from pandas>=0.19 onwards. Instead, you should use the separate pandas-datareader package. Install with:

pip install pandas-datareader

And then you can do this in Python:

import pandas_datareader as pdr
from datetime import datetime

ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1))
print(ibm['Adj Close'])

Downloading from Google Finance is also supported.

There's more in the documentation of pandas-datareader.

Def_Os
  • 5,301
  • 5
  • 34
  • 63
  • 2
    When I try it, the imports work fine but when I call the 'goog' line I receive an error: "IOError: after 3 tries, Yahoo! did not return a 200 for url 'http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=0&b=1&c=2000&d=0&e=1&f=2012&g=d&ignore=.csv'" How could this be fixed? – Cleb Aug 25 '15 at 15:01
  • @Cleb Seems to be because `GOOG` is not accepted by the API (don't understand why, after move to Alphabet GOOG ticker was kept). Works fine for `GOOGL` and various other symbols. Example adjusted just in case. – Def_Os Oct 28 '15 at 02:10
42

Short answer: Yes. Use Python's urllib to pull the historical data pages for the stocks you want. Go with Yahoo! Finance; Google is both less reliable, has less data coverage, and is more restrictive in how you can use it once you have it. Also, I believe Google specifically prohibits you from scraping the data in their ToS.

Longer answer: This is the script I use to pull all the historical data on a particular company. It pulls the historical data page for a particular ticker symbol, then saves it to a csv file named by that symbol. You'll have to provide your own list of ticker symbols that you want to pull.

import urllib

base_url = "http://ichart.finance.yahoo.com/table.csv?s="
def make_url(ticker_symbol):
    return base_url + ticker_symbol

output_path = "C:/path/to/output/directory"
def make_filename(ticker_symbol, directory="S&P"):
    return output_path + "/" + directory + "/" + ticker_symbol + ".csv"

def pull_historical_data(ticker_symbol, directory="S&P"):
    try:
        urllib.urlretrieve(make_url(ticker_symbol), make_filename(ticker_symbol, directory))
    except urllib.ContentTooShortError as e:
        outfile = open(make_filename(ticker_symbol, directory), "w")
        outfile.write(e.content)
        outfile.close()
Joe C.
  • 1,538
  • 11
  • 14
  • I get the following error: `AttributeError: module 'urllib' has no attribute 'ContentTooShortError'` – adrCoder Oct 29 '19 at 10:26
  • This answer was written 7 years ago, for Python 2.7. In the intervening timespan, Python 3 has modified the `urllib` module to be broken into submodules. The exception class you're looking for can be found here: https://docs.python.org/3/library/urllib.error.html#urllib.error.ContentTooShortError – Joe C. Oct 29 '19 at 20:23
14

Extending @Def_Os's answer with an actual demo...

As @Def_Os has already said - using Pandas Datareader makes this task a real fun

In [12]: from pandas_datareader import data

pulling all available historical data for AAPL starting from 1980-01-01

#In [13]: aapl = data.DataReader('AAPL', 'yahoo', '1980-01-01')

# yahoo api is inconsistent for getting historical data, please use google instead.
In [13]: aapl = data.DataReader('AAPL', 'google', '1980-01-01')

first 5 rows

In [14]: aapl.head()
Out[14]:
                 Open       High     Low   Close     Volume  Adj Close
Date
1980-12-12  28.750000  28.875000  28.750  28.750  117258400   0.431358
1980-12-15  27.375001  27.375001  27.250  27.250   43971200   0.408852
1980-12-16  25.375000  25.375000  25.250  25.250   26432000   0.378845
1980-12-17  25.875000  25.999999  25.875  25.875   21610400   0.388222
1980-12-18  26.625000  26.750000  26.625  26.625   18362400   0.399475

last 5 rows

In [15]: aapl.tail()
Out[15]:
                 Open       High        Low      Close    Volume  Adj Close
Date
2016-06-07  99.250000  99.870003  98.959999  99.029999  22366400  99.029999
2016-06-08  99.019997  99.559998  98.680000  98.940002  20812700  98.940002
2016-06-09  98.500000  99.989998  98.459999  99.650002  26419600  99.650002
2016-06-10  98.529999  99.349998  98.480003  98.830002  31462100  98.830002
2016-06-13  98.690002  99.120003  97.099998  97.339996  37612900  97.339996

save all data as CSV file

In [16]: aapl.to_csv('d:/temp/aapl_data.csv')

d:/temp/aapl_data.csv - 5 first rows

Date,Open,High,Low,Close,Volume,Adj Close
1980-12-12,28.75,28.875,28.75,28.75,117258400,0.431358
1980-12-15,27.375001,27.375001,27.25,27.25,43971200,0.408852
1980-12-16,25.375,25.375,25.25,25.25,26432000,0.378845
1980-12-17,25.875,25.999999,25.875,25.875,21610400,0.38822199999999996
1980-12-18,26.625,26.75,26.625,26.625,18362400,0.399475
...
Bhargav
  • 454
  • 1
  • 6
  • 15
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
8

There is already a library in Python called yahoo_finance so you'll need to download the library first using the following command line:

sudo pip install yahoo_finance

Then once you've installed the yahoo_finance library, here's a sample code that will download the data you need from Yahoo Finance:

#!/usr/bin/python
import yahoo_finance
import pandas as pd

symbol = yahoo_finance.Share("GOOG")
google_data = symbol.get_historical("1999-01-01", "2016-06-30")
google_df = pd.DataFrame(google_data)

# Output data into CSV
google_df.to_csv("/home/username/google_stock_data.csv")

This should do it. Let me know if it works.

UPDATE: The yahoo_finance library is no longer supported.

Naufal
  • 1,203
  • 14
  • 12
4

You can check out the yahoo_fin package. It was initially created after Yahoo Finance changed their API (documentation is here: http://theautomatic.net/yahoo_fin-documentation).

from yahoo_fin import stock_info as si

aapl_data = si.get_data("aapl")

nflx_data = si.get_data("nflx")

aapl_data.head()

nflx_data.head()

aapl_data.to_csv("aapl_data.csv")

nflx_data.to_csv("nflx_data.csv")
atreadw
  • 289
  • 2
  • 6
  • Thank you for this reference, this was the only ready solution that work for me. All other packages mentioned here seem to be outdated and don't work, probably because of the API changes in Yahoo Finance.\ – Tim Mironov Jul 02 '19 at 11:53
  • typo: aapl.to_csv("aapl_data.csv")` should be `aapl_data.to_csv("aapl_data.csv")` – ShpielMeister Aug 27 '21 at 00:54
2

It's trivial when you know how:

import yfinance as yf
df = yf.download('CVS', '2015-01-01')
df.to_csv('cvs-health-corp.csv')

If you wish to plot it:

import finplot as fplt
fplt.candlestick_ochl(df[['Open','Close','High','Low']])
fplt.show()

enter image description here

Jonas Byström
  • 25,316
  • 23
  • 100
  • 147