Is there a way to automatically download historical prices of stocks from yahoo finance or google finance (csv format)? Preferably in Python.

- 205,989
- 36
- 386
- 419

- 10,741
- 27
- 89
- 143
-
check out http://scrape-google-finance.compunect.com/ it's a rather new open source PHP scraper for Google finance. It's free to use/modify and you can download all stock prices and all companies from Google. Should not be too difficult to learn from it and write the same in python. – John Jun 22 '14 at 19:14
6 Answers
When you're going to work with such time series in Python, pandas
is indispensable. And here's the good news: it comes with a historical data downloader for Yahoo: pandas.io.data.DataReader
.
from pandas.io.data import DataReader
from datetime import datetime
ibm = DataReader('IBM', 'yahoo', datetime(2000, 1, 1), datetime(2012, 1, 1))
print(ibm['Adj Close'])
Here's an example from the pandas
documentation.
Update for pandas >= 0.19:
The pandas.io.data
module has been removed from pandas>=0.19
onwards. Instead, you should use the separate pandas-datareader
package. Install with:
pip install pandas-datareader
And then you can do this in Python:
import pandas_datareader as pdr
from datetime import datetime
ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1))
print(ibm['Adj Close'])

- 5,301
- 5
- 34
- 63
-
2When I try it, the imports work fine but when I call the 'goog' line I receive an error: "IOError: after 3 tries, Yahoo! did not return a 200 for url 'http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=0&b=1&c=2000&d=0&e=1&f=2012&g=d&ignore=.csv'" How could this be fixed? – Cleb Aug 25 '15 at 15:01
-
@Cleb Seems to be because `GOOG` is not accepted by the API (don't understand why, after move to Alphabet GOOG ticker was kept). Works fine for `GOOGL` and various other symbols. Example adjusted just in case. – Def_Os Oct 28 '15 at 02:10
Short answer: Yes. Use Python's urllib to pull the historical data pages for the stocks you want. Go with Yahoo! Finance; Google is both less reliable, has less data coverage, and is more restrictive in how you can use it once you have it. Also, I believe Google specifically prohibits you from scraping the data in their ToS.
Longer answer: This is the script I use to pull all the historical data on a particular company. It pulls the historical data page for a particular ticker symbol, then saves it to a csv file named by that symbol. You'll have to provide your own list of ticker symbols that you want to pull.
import urllib
base_url = "http://ichart.finance.yahoo.com/table.csv?s="
def make_url(ticker_symbol):
return base_url + ticker_symbol
output_path = "C:/path/to/output/directory"
def make_filename(ticker_symbol, directory="S&P"):
return output_path + "/" + directory + "/" + ticker_symbol + ".csv"
def pull_historical_data(ticker_symbol, directory="S&P"):
try:
urllib.urlretrieve(make_url(ticker_symbol), make_filename(ticker_symbol, directory))
except urllib.ContentTooShortError as e:
outfile = open(make_filename(ticker_symbol, directory), "w")
outfile.write(e.content)
outfile.close()

- 1,538
- 11
- 14
-
I get the following error: `AttributeError: module 'urllib' has no attribute 'ContentTooShortError'` – adrCoder Oct 29 '19 at 10:26
-
This answer was written 7 years ago, for Python 2.7. In the intervening timespan, Python 3 has modified the `urllib` module to be broken into submodules. The exception class you're looking for can be found here: https://docs.python.org/3/library/urllib.error.html#urllib.error.ContentTooShortError – Joe C. Oct 29 '19 at 20:23
Extending @Def_Os's answer with an actual demo...
As @Def_Os has already said - using Pandas Datareader makes this task a real fun
In [12]: from pandas_datareader import data
pulling all available historical data for AAPL
starting from 1980-01-01
#In [13]: aapl = data.DataReader('AAPL', 'yahoo', '1980-01-01')
# yahoo api is inconsistent for getting historical data, please use google instead.
In [13]: aapl = data.DataReader('AAPL', 'google', '1980-01-01')
first 5 rows
In [14]: aapl.head()
Out[14]:
Open High Low Close Volume Adj Close
Date
1980-12-12 28.750000 28.875000 28.750 28.750 117258400 0.431358
1980-12-15 27.375001 27.375001 27.250 27.250 43971200 0.408852
1980-12-16 25.375000 25.375000 25.250 25.250 26432000 0.378845
1980-12-17 25.875000 25.999999 25.875 25.875 21610400 0.388222
1980-12-18 26.625000 26.750000 26.625 26.625 18362400 0.399475
last 5 rows
In [15]: aapl.tail()
Out[15]:
Open High Low Close Volume Adj Close
Date
2016-06-07 99.250000 99.870003 98.959999 99.029999 22366400 99.029999
2016-06-08 99.019997 99.559998 98.680000 98.940002 20812700 98.940002
2016-06-09 98.500000 99.989998 98.459999 99.650002 26419600 99.650002
2016-06-10 98.529999 99.349998 98.480003 98.830002 31462100 98.830002
2016-06-13 98.690002 99.120003 97.099998 97.339996 37612900 97.339996
save all data as CSV file
In [16]: aapl.to_csv('d:/temp/aapl_data.csv')
d:/temp/aapl_data.csv - 5 first rows
Date,Open,High,Low,Close,Volume,Adj Close
1980-12-12,28.75,28.875,28.75,28.75,117258400,0.431358
1980-12-15,27.375001,27.375001,27.25,27.25,43971200,0.408852
1980-12-16,25.375,25.375,25.25,25.25,26432000,0.378845
1980-12-17,25.875,25.999999,25.875,25.875,21610400,0.38822199999999996
1980-12-18,26.625,26.75,26.625,26.625,18362400,0.399475
...

- 454
- 1
- 6
- 15

- 205,989
- 36
- 386
- 419
There is already a library in Python called yahoo_finance so you'll need to download the library first using the following command line:
sudo pip install yahoo_finance
Then once you've installed the yahoo_finance library, here's a sample code that will download the data you need from Yahoo Finance:
#!/usr/bin/python
import yahoo_finance
import pandas as pd
symbol = yahoo_finance.Share("GOOG")
google_data = symbol.get_historical("1999-01-01", "2016-06-30")
google_df = pd.DataFrame(google_data)
# Output data into CSV
google_df.to_csv("/home/username/google_stock_data.csv")
This should do it. Let me know if it works.
UPDATE: The yahoo_finance library is no longer supported.

- 1,203
- 14
- 12
-
Just curious -- what benefit does yahoo-finance provide over pandas_datareader (or vice versa)? – Mathematician Jan 31 '18 at 04:35
-
2At the time that was the only library I could find in relation to pulling Yahoo Finance Stock Prices. However, it doesn't seem to be working anymore. – Naufal Jan 31 '18 at 10:07
-
4
-
-
1
You can check out the yahoo_fin package. It was initially created after Yahoo Finance changed their API (documentation is here: http://theautomatic.net/yahoo_fin-documentation).
from yahoo_fin import stock_info as si
aapl_data = si.get_data("aapl")
nflx_data = si.get_data("nflx")
aapl_data.head()
nflx_data.head()
aapl_data.to_csv("aapl_data.csv")
nflx_data.to_csv("nflx_data.csv")

- 289
- 2
- 6
-
Thank you for this reference, this was the only ready solution that work for me. All other packages mentioned here seem to be outdated and don't work, probably because of the API changes in Yahoo Finance.\ – Tim Mironov Jul 02 '19 at 11:53
-
typo: aapl.to_csv("aapl_data.csv")` should be `aapl_data.to_csv("aapl_data.csv")` – ShpielMeister Aug 27 '21 at 00:54
It's trivial when you know how:
import yfinance as yf
df = yf.download('CVS', '2015-01-01')
df.to_csv('cvs-health-corp.csv')
If you wish to plot it:
import finplot as fplt
fplt.candlestick_ochl(df[['Open','Close','High','Low']])
fplt.show()

- 25,316
- 23
- 100
- 147