-3

This is the html output from prettify I am trying to get the list of college names from an online dataset table (search result), and the college names are in between the tag and , i am not sure how to remove those from the result.

geo_table = soup.find('table',{'id':'ctl00_cphCollegeNavBody_ucResultsMain_tblResults'})

Colleges=geo_table.findAll('strong')
Colleges

I am thinking that the problem is I am extracting the wrong part because refers to bold the line. Where shall I find the college name?

This is a sample output:

href="?s=IL+MA+PA&p=14.0802+14.0801+14.3901&l=91+92+93+94&id=211440"

WY G
  • 129
  • 10

1 Answers1

0

To fetch the href value you need to find_all <a> tag and then iterate the loop and get the attribute value href to fetch the college name you can find <strong> tag and get the text value.

geo_table =soup.find('table',{'id':'ctl00_cphCollegeNavBody_ucResultsMain_tblResults'})

Colleges=geo_table.findAll('a')
for college in Colleges:
    print('href :' + college['href'])
    print('college Name : ' + college.find('strong').text )
KunduK
  • 32,888
  • 5
  • 17
  • 41