0

For some reason I can't find a table by id or select the table by id.. I've been referring to the docs for BS and from what I can tell it should be working..

Below is an example of the code to try and select the table by the id "per_game", content.find(id='per_game') doesn't work for me either.

I've been referring to the find and CSS selector part of the docs, here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find

import requests
import csv
import calendar
from datetime import date, datetime, timedelta
from collections import OrderedDict, defaultdict
from bs4 import BeautifulSoup as soup

season = str(date.today().year + 1)
month = calendar.month_name[date.today().month].lower()

teamUrl = "https://basketball-reference.com/teams/"

urls       =    [teamUrl + 'ATL/' + season +'.html'] # Atlanta Hawks
                 # teamUrl + 'BOS/' + season +'.html', # Boston Celtics
                 # teamUrl + 'BKN/' + season +'.html', # Brooklyn Nets
                 # teamUrl + 'CHA/' + season +'.html', # Charlotte Hornets

for url in urls:
    page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
    content = soup(page.content, 'html.parser')
    table = content.select("#per_game")
    print(table)

Many thanks, OM.

Shaun
  • 65
  • 1
  • 8
  • That content is built client-side using JavaScript, so your code won't see it. – ChrisGPT was on strike Nov 15 '18 at 01:46
  • Possible duplicate of https://stackoverflow.com/questions/43120445/scraping-a-webpage-that-has-javascript-with-beautifulsoup – ChrisGPT was on strike Nov 15 '18 at 01:47
  • Possible duplicate of https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python – ChrisGPT was on strike Nov 15 '18 at 01:48
  • You need to use some module which can run JavaScript, then you can load all the data. btw i ever answer a question like this one (same website but he want to scrape the records of each player) – KC. Nov 15 '18 at 02:59
  • Possible duplicate of [BS4 Not Locating Element in Python](https://stackoverflow.com/questions/53110585/bs4-not-locating-element-in-python) – KC. Nov 15 '18 at 03:03

1 Answers1

0

this is not Ajax, just remove comment from the html

page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
html_doc = page.text.replace('<!--', '').replace('-->', '')
content = soup(html_doc, 'html.parser')
ewwink
  • 18,382
  • 2
  • 44
  • 54