Trying to scrape data stored in table using BeautifulSoup Python

Question

I'm trying to scrape data from this table enter image description here and here's the code I'm using

## Scraping data for schools
from urllib.request import urlopen
from bs4 import BeautifulSoup

#List of schools
page=urlopen('https://mcss.knack.com/school-districts#all-school-contacts/')
soup = BeautifulSoup(page,'html.parser')

School=[]
Address=[]
Phone=[]
Principal=[]
Email=[]
District=[]

# Indexing rows and then identifying cells
for rows in soup.findAll('tr'):
    cells = rows.findAll('td')
    if len(cells)==7:
        School.append(soup.find("span", {'class':'col-0'}).text)
        Address.append(soup.find("span", {'class':'col-1'}).text)
        Phone.append(soup.find("span", {'class':'col-2'}).text)
        Principal.append(soup.find("span", {'class':'col-3'}).text)
        Email.append(soup.find("span", {'class':'col-4'}).text)
        District.append(soup.find("span", {'class':'col-5'}).text)

import pandas as pd
school_frame = pd.DataFrame({'School' : School,
                           'Address' : Address,
                           'Phone':Phone,
                           'Principal':Principal,
                           'Email':Email,
                           'District':District
                            })

school_frame.head()
#school_frame.to_csv('school_address.csv')

And in return I'm getting only the header names of the columns of data frame.

What am I doing wrong?

Please provide a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), Your posting depends on a volatile, off-site image, rather than being self-contained. Also, you've neglected to reduce this to a minimum: is the problem with the initial `findAll`, the filtering on each record, or the data frame formation? — Prune, Mar 09 '20 at 02:16

score 0 · Answer 1 · answered Mar 09 '20 at 03:25

When you check the actual value of page, you will see that it does not contain any table but an empty div which will later be filled by javascript dynamically. urllib.request does not run the javascript and just returns an empty page with no table to you. You could use selenium to emulate a browser (which runs javascript) and then fetch the resulting html of that website (see this stackoverflow answer for an example).

Trying to scrape data stored in table using BeautifulSoup Python

1 Answers1