0

I am using Python 3.3 on Windows. I am trying to figure out how to download a .csv file from yahoo finance. It is a file for the Historical Prices.

This is the source code where the link is I'm trying to access.

<p>  
 <a href="http://ichart.finance.yahoo.com/table.csv?s=AAPL&amp;d=1&amp;e=1&amp;f=2014&amp;g=d&amp;a=8&amp;b=7&amp;c=1984&amp;ignore=.csv">
<img src="http://l.yimg.com/a/i/us/fi/02rd/spread.gif" width="16" height="16" alt="" border="0">
<strong>Download to Spreadsheet</strong>
 </a>
</p> 

And here is the code I wrote to do it.

from urllib.request import urlopen
from bs4 import BeautifulSoup

website = "http://ichart.finance.yahoo.com/table.csv?s=AAPL&amp;d=1&amp;e=1&amp;f=2014&amp;g=d&amp;a=8&amp;b=7&amp;c=1984&amp;ignore=.csv"
html = urlopen(website)
soup = BeautifulSoup(html)

When I ran the code, I was expecting it to start the download and put it into my downloads folder, but it doesn't do anything. It runs and then stops. No csv file shows up in my downloads. So I think I'm missing something else in this code.

user2859603
  • 235
  • 4
  • 9
  • 18
  • The only thing you do is read the URL, parse it with BeautifulSoup and then end without doing anything else. How should Python know that you want to save the url? If you want to have the file in your downloads folder, you need to tell Python to do that. – poke Feb 01 '14 at 17:09
  • I figured that was going on. What line(s) of code would accomplish that? – user2859603 Feb 01 '14 at 18:01
  • For example: [How to download a file using Python?](http://stackoverflow.com/questions/8116623/how-to-download-a-file-using-python) – poke Feb 01 '14 at 19:06

2 Answers2

2

You can do this with just urllib. The following code downloads the .csv file and puts the contents into a string named 'csv'. Then it saves the string to a file:

from urllib import request

# Retrieve the webpage as a string
response = request.urlopen("http://ichart.finance.yahoo.com/table.csv?s=AAPL&amp;d=1&amp;e=1&amp;f=2014&amp;g=d&amp;a=8&amp;b=7&amp;c=1984&amp;ignore=.csv")
csv = response.read()

# Save the string to a file
csvstr = str(csv).strip("b'")

lines = csvstr.split("\\n")
f = open("historical.csv", "w")
for line in lines:
   f.write(line + "\n")
f.close()
Kevin
  • 2,112
  • 14
  • 15
  • It made the .csv file, but didn't write the lines in properly. – user2859603 Feb 01 '14 at 19:44
  • I updated the save code. The output file should be in complete csv format now. – Kevin Feb 01 '14 at 21:57
  • Thanks! This worked. What does the .strip("b'") mean? – user2859603 Feb 01 '14 at 22:07
  • 1
    The response.read() command returns an object of type rather than a string. str(csv) converts it to a string, but leaves the letter b and some quotes as artifacts of the conversions, i.e. b'XXXXXX' strip("b'") removes them to clean up the data. There is probably a cleaner way to do that conversion without the artifacts. – Kevin Feb 01 '14 at 22:13
  • There certainly is a cleaner way to decode bytes to a unicode string; you'd use the `bytes.decode(` method. But since you are saving the whole thing to a file *anyway* you just open the file in binary mode and write the response to it directly: `open('historical.csv', 'wb').write(response.read())`. – Martijn Pieters Sep 13 '14 at 22:08
  • Turning bytes into a string with `str()`, then having to handle the newlines as `\n` literals is.. very wrong. – Martijn Pieters Sep 13 '14 at 22:09
  • what if you wanted to interate over this (example doing over many stocks) – Kamster Mar 07 '15 at 21:33
0

since you already use BeautifulSoup and urllib:

url = BeautifulSoup(html).find('a')['href']
urllib.urlretrieve(url, '/path/to/downloads/file.csv')
Guy Gavriely
  • 11,228
  • 6
  • 27
  • 42
  • Could you elaborate on this? I added these two lines with the path being, 'C:\Users\David\Downloads' The name would change unless I clear the download folder every run, because it will save it as table, then table(1), then table(2). And so on if I run it multiple times. – user2859603 Feb 01 '14 at 19:58