0

I am trying to scrape a table from espn and send the data to a pandas dataframe in order to export it to excel. I have completed most of the scraping, but am getting stuck on how to send each 'td' tag to a unique dataframe cell within my for loop. (Code is below) Any thoughts? Thanks!

import requests
import urllib.request
from bs4 import BeautifulSoup
import re
import os
import csv
import pandas as pd

def make_soup(url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata

soup = make_soup("http://www.espn.com/nba/statistics/player/_/stat/scoring-
per-game/sort/avgPoints/qualified/false")

regex = re.compile("^[e-o]")

for record in soup.findAll('tr', {"class":regex}):
    for data in record.findAll('td'):
        print(data)
johankent30
  • 65
  • 2
  • 3
  • 11

1 Answers1

0

I was actually recently scraping sports websites working on a daily fantasy sports algorithm for a class. This is the script I wrote up. Perhaps this approach can work for you. Build a dictionary. Convert it to a dataframe.

    url = http://www.footballdb.com/stats/stats.html?lg=NFL&yr={0}&type=reg&mode={1}&limit=all

    result = requests.get(url)
    c = result.content

    # Set as Beautiful Soup Object
    soup = BeautifulSoup(c)

    # Go to the section of interest
    tables = soup.find("table",{'class':'statistics'})

    data = {}
    headers = {}
    for i, header in enumerate(tables.findAll('th')):
        data[i] = {}
        headers[i] = str(header.get_text())

    table = tables.find('tbody')
    for r, row in enumerate(table.select('tr')):
        for i, cell in enumerate(row.select('td')):
            try:
                data[i][r] = str(cell.get_text())
            except:
                stat = strip_non_ascii(cell.get_text())
                data[i][r] = stat

    for i, name in enumerate(tables.select('tbody .left .hidden-xs a')):
        data[0][i] = str(name.get_text())

    df = pd.DataFrame(data=data)
Yale Newman
  • 1,141
  • 1
  • 13
  • 22