2

I created a program that takes stock tickers, crawls the web to find a CSV of each ticker's historical prices, and plots them using matplotlib. Almost everything is working fine, but I've run into a problem parsing the CSV to separate out each price.

The error I get is:

prices = [float(row[4]) for row in csv_rows]

IndexError: list index out of range

I get what the problem is here, I'm just not really sure how I should fix it.

(The issue is in the parseCSV() method)

# Loop to chart multiple stocks
def chartStocks(*tickers):
    for ticker in tickers:
        chartStock(ticker)
        
# Single chart stock method
def chartStock(ticker):
    url = "http://finance.yahoo.com/q/hp?s=" + str(ticker) + "+Historical+Prices"
    sourceCode = requests.get(url)
    plainText = sourceCode.text
    soup = BeautifulSoup(plainText, "html.parser")
    csv = findCSV(soup)
    parseCSV(csv)

# Find the CSV URL        
def findCSV(soupPage):
    CSV_URL_PREFIX = 'http://real-chart.finance.yahoo.com/table.csv?s='
    links = soupPage.findAll('a')
    for link in links:
        href = link.get('href', '')
        if href.startswith(CSV_URL_PREFIX):
            return href
    
# Parse CSV for daily prices
def parseCSV(csv_text):
    csv_rows = csv.reader(csv_text.split('\n'))

    prices = [float(row[4]) for row in csv_rows]
    days = list(range(len(prices)))
    point = collections.namedtuple('Point', ['x', 'y'])

    for price in prices:
        i = 0
        p = point(days[i], prices[i])
        points = []
        points.append(p)
    print(points)

    plotStock(points)

# Plot the data
def plotStock(points):
    plt.plot(points)
    plt.show()
Community
  • 1
  • 1
123
  • 8,733
  • 14
  • 57
  • 99
  • 2
    Once you use @mhawke's answer to actually download the csv data and @Alexander's to handle rows with less than 5 items on them, then you also need to move the `points = []` out of the `for` loop near the end of `parseCSV()`. – martineau Feb 18 '16 at 09:00
  • 2
    The issue you *asked about* (dealing with an IndexError while processing CSV data) seems to be a duplicate of [this earlier question](http://stackoverflow.com/questions/19883776/indexerror-while-printing-in-from-csv) (or maybe [this one](http://stackoverflow.com/questions/32621595/indexerror-index-out-of-range-7) or [this one](http://stackoverflow.com/questions/11786157/if-list-index-exists-do-x)). Of course, as several others have already pointed out, what you *thought* was the problem was actually the least of the issues in your code. – Ilmari Karonen Feb 18 '16 at 19:01
  • 1
    I believe this post is [under discussion on meta](https://meta.stackoverflow.com/questions/317124/how-to-deal-with-peer-pressure-to-change-the-answer-i-accepted?cb=1). – Yakk - Adam Nevraumont Feb 20 '16 at 18:07

2 Answers2

14

The problem is that parseCSV() expects a string containing CSV data, but it is actually being passed the URL of the CSV data, not the downloaded CSV data.

This is because findCSV(soup) returns the value of href for the CSV link found on the page, and then that value is passed to parseCSV(). The CSV reader finds a single undelimited row of data, so there is only one column, not the >4 that is expected.

At no point is the CSV data actually being downloaded.

You could write the first few lines of parseCSV() like this:

def parseCSV(csv_url):
    r = requests.get(csv_url) 
    csv_rows = csv.reader(r.iter_lines())
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • ..and after the OP does all that they'll see that the rows of data look like what is shown in the table [here](http://finance.yahoo.com/q/hp?s=AAPL+Historical+Prices) — so the rest of their code for parsing and plotting the data also won't work. – martineau Feb 18 '16 at 08:26
  • 2
    @martineau: actually, after adding the row length check suggested in another answer, I don't think that the OP will see anything at all - just a plot of an empty list. – mhawke Feb 18 '16 at 08:36
  • I decided to forget about the CSV module and just parse it myself. Works fine now :) – 123 Feb 18 '16 at 08:39
  • @123: so you are now downloading the CSV data? – mhawke Feb 18 '16 at 08:44
  • 1
    @123: Oh, I see... is the data complete? There are pagination links at the bottom and clicking "next" gives the next page of results. – mhawke Feb 18 '16 at 08:48
  • @martineau: indeed. Sometimes, it seems, that we just can not help :) – mhawke Feb 18 '16 at 10:32
3

You need to check if your row has at least five elements (i.e. index location 4).

prices = [float(row[4]) for row in csv_rows if len(row) > 4]
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • .and this is a common problem with csv files that have a few extra empty lines in them. You'd think the csv parser would skip those lines, but it doesn't. – tdelaney Feb 18 '16 at 08:11
  • 6
    This is not the problem. The URL of the CSV resource is being passed to `parseCSV()`. The actual CSV data is never retrieved. – mhawke Feb 18 '16 at 08:14