0

I am using the following code and it works well except for the fact that my code spits out on to a CSV file from Excel and it skips every other line. I have googled the csv module documentation and other examples in stackoverflow.com and I found that I need to use DictWriter with the lineterminator set at '\n'. My own attempts to write it into the code have been foiled.

So I am wondering is there a way for me to apply this(being the lineterminator) to the whole file so that I do not have any lines skipped? And if so how?

Here is the code:

import urllib2
from BeautifulSoup import BeautifulSoup
import csv

page = urllib2.urlopen('http://finance.yahoo.com/q/ks?s=F%20Key%20Statistics').read()

f = csv.writer(open("pe_ratio.csv","w"))
f.writerow(["Name","PE"])

soup = BeautifulSoup(page)
all_data = soup.findAll('td', "yfnc_tabledata1")
f.writerow([all_data[2].getText()])

Thanks for your help in advance.

Community
  • 1
  • 1
Robert Birch
  • 251
  • 2
  • 4
  • 16
  • 2
    What do you mean by 'skips every other line'? Can you give an example input, the output you're getting and the desired output? – Austin Phillips Oct 30 '13 at 05:25

2 Answers2

0

First, since Yahoo provides an API that returns CSV files, maybe you can solve your problem that way? For example, this URL returns a CSV file containing prices, market cap, P/E and other metrics for all stocks in that industry. There is some more information in this Google Code project.

Your code only produces a two-row CSV because there are only two calls to f.writerow(). If the only piece of data you want from that page is the P/E ratio, this is almost certainly not the best way to do it, but you should pass to f.writerow() a tuple containing the value for each column. To be consistent with your header row, that would be something like:

f.writerow( ('Ford', all_data[2].getText()) )

Of course, that assumes that the P/E ratio will always be second in the list. If instead you wanted all the statistics provided on that page, you could try:

# scrape the html for the name and value of each metric
metrics = soup.findAll('td', 'yfnc_tablehead1')
values = soup.findAll('td', 'yfnc_tabledata1')

# create a list of tuples for the writerows method
def stripTag(tag): return tag.text
data = zip(map(stripTag, metrics), map(stripTag, values))

# write to csv file
f.writerows(data)
sjy
  • 2,702
  • 1
  • 21
  • 22
0

You need to open your file with the right options for the csv.writer class to work correctly. The module has universal newline support internally, so you need to turn off Python's universal newline support at the file level.

For Python 2, the docs say:

If csvfile is a file object, it must be opened with the 'b' flag on platforms where that makes a difference.

For Python 3, they say:

If csvfile is a file object, it should be opened with newline=''.

Also, you should probably use a with statement to handle opening and closing your file, like this:

with open("pe_ratio.csv","wb") as f: # or open("pe_ratio.csv", "w", newline="") in Py3
    writer = csv.writer(f)

    # do other stuff here, staying indented until you're done writing to the file
Blckknght
  • 100,903
  • 11
  • 120
  • 169