3

I would like to get the number within the nested tag. How would I do this?

My code outputs this, but I'd like to get the #40, not the whole two lines:

<span class="rankings-score">
<span>#40</span>

Here is my code:

from bs4 import BeautifulSoup
import requests
import csv

site =  "http://www.usnews.com/education/best-high-schools/national-rankings/page+2"

fields = ['national_rank','school','address','school_page','medal','ratio','size_desc','students','teachers'] 

r = requests.get(site)
html_source = r.text
soup = BeautifulSoup(html_source)

table = soup.find('table')    
rows_list = []      

for row in table.find_all('tr'):                                                                                                                                                                                                                                               

    d = dict()

    d['national_rank'] = row.find("span", 'rankings-score')
    print d['national_rank']

I get this error:

AttributeError: 'NoneType' object has no attribute 'span'

when I try this:

d['national_rank'] = row.find("span", 'rankings-score').span.text
goldisfine
  • 4,742
  • 11
  • 59
  • 83

1 Answers1

6

access the text of the nested span:

score_span = row.find("span", 'rankings-score')
if score_span is not None:
    print score_span.span.text

You need to make sure that row.find("span", 'rankings-score') actually found something; above I test that there is indeed a <span> found.

The .find() method returns None if no matching object was found, so in general, whenever you get a AttributeError: 'NoneType' object has no attribute ... exception, involving an object you tried to load with Element.find(), then you need to test for None before trying to further access information.

This applies to object.find, object.find_all, object[...] tag attribute access, object.<tagname>, object.select, etc. etc.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343