0

Extracting the "2016-Annual" table in http://www.americashealthrankings.org/api/v1/downloads/131 to a csv. The table has 3 fields- STATE, RANK, VALUE. Getting error with the following:

import urllib2 
from bs4 import BeautifulSoup
import csv

url = 'http://www.americashealthrankings.org/api/v1/downloads/131'
header = {'User-Agent': 'Mozilla/5.0'} 
req = urllib2.Request(url,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

table = soup.find('2016-Annual', {'class': 'STATE-RANK-VALUE'})

f = open('output.csv', 'w')

for row in table.findAll('tr'):
    cells = row.findAll('td')

    if len(cells) == 3:
        STATE = cells[0].find(text=True)
        RANK = cells[1].find(text=True)
        VALUE = cells[2].find(text=True)

    print  write_to_file
    f.write(write_to_file)

f.close()

What am I missing here? Using python 2.7

user7717771
  • 23
  • 1
  • 4

3 Answers3

1

you code is wrong

this 'http://www.americashealthrankings.org/api/v1/downloads/131' download csv file.

download csv file to local computer, you can use this file.

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''

import urllib2

url = 'http://www.americashealthrankings.org/api/v1/downloads/131'

html = urllib2.urlopen(url).read()

with open('output.csv', 'w') as output:
    output.write(html)
0

According to the Beautifulsoup docs, you need to pass a string to be parsed on initialization. However, page = urllib2.urlopen(req) returns a pointer to a page.

Try using soup = BeautifulSoup(page.read(), 'html.parser') instead.

Also, the variable write_to_file doesn't exist.

If this doesn't solve it, please also post which error you get.

Fernando Cezar
  • 858
  • 7
  • 22
0

The reason its not working is because your pointing to a file that is already a csv - you can literally load that URL in your browser and it will download in CSV file format ---- the table your expecting though, is not at that endpoint - it is at this URL:

http://www.americashealthrankings.org/explore/2016-annual-report

Also - I dont see a class called STATE-RANK-VALUE I only see th headers called state,rank, and ,value

Aurielle Perlmann
  • 5,323
  • 1
  • 15
  • 26