2

When trying to scrape a web page, this table has no <tr> tags, and is all <div> tags.

The site inspector that I'm trying to scrape looks as follows: inspector screenshot

I'd like to be able to grab the info from the table-row class, but the scrape never returns anything. With the code below, when I scrape the .table-header, or just .practiceDataTable, I'm able to get the data from that.

import bs4
import requests

res = requests.get('https://www.nascar.com/results/race_center/2018/monster-energy-nascar-cup-series/auto-club-400/stn/race/')

soup = bs4.BeautifulSoup(res.text, 'lxml')

soup.select('.nrwgt-lbh .practiceDataTable')

for i in soup.select('.nrwgt-lbh .practiceDataTable .table-row'):
    print(i.text)

I also noticed that in the inspector, the class "practiceDataTable" has a space after it and then "dataTable", but when I use that anywhere in the code, the code doesn't work.

Drise
  • 4,310
  • 5
  • 41
  • 66
sbiondio
  • 33
  • 8

2 Answers2

2

An inspection of the source from a urllib.urlopen object shows that the site is dynamic, as no updated div object with class table-row can be found. Thus, you need to use a browser manipulation tool such as selenium:

from bs4 import BeautifulSoup as soup
import re
import urllib
from selenium import webdriver
d = webdriver.Chrome()
classes = ['position', 'chase', 'car-number', 'driver', 'manufacturer', 'start-position not-mobile', 'laps not-mobile', 'laps-led not-mobile', 'final-status', 'points not-mobile', 'bonus not-mobile']
d.get('https://www.nascar.com/results/race_center/2018/monster-energy-nascar-cup-series/auto-club-400/stn/race/')
new_data = [filter(None, [b.text for b in i.find_all('div', {'class':re.compile('|'.join(classes))})]) for i in soup(d.page_source, 'lxml').find_all('div', {'class':'table-row'})]

Output:

[[u'00', u'JeffreyEarnhardt'], [u'1', u'JamieMcMurray'], [u'2', u'BradKeselowski'], [u'3', u'AustinDillon'], [u'4', u'KevinHarvick'], [u'6', u'TrevorBayne'], [u'9', u'ChaseElliott'], [u'10', u'AricAlmirola'], [u'11', u'DennyHamlin'], [u'12', u'RyanBlaney'], [u'13', u'TyDillon'], [u'14', u'ClintBowyer'], [u'15', u'RossChastain'], [u'17', u'RickyStenhouse Jr.'], [u'18', u'KyleBusch'], [u'19', u'DanielSuarez'], [u'20', u'ErikJones'], [u'21', u'PaulMenard'], [u'22', u'JoeyLogano'], [u'23', u'GrayGaulding'], [u'24', u'WilliamByron'], [u'31', u'RyanNewman'], [u'32', u'MattDiBenedetto'], [u'34', u'MichaelMcDowell'], [u'37', u'ChrisBuescher'], [u'38', u'DavidRagan'], [u'41', u'KurtBusch'], [u'42', u'KyleLarson'], [u'43', u'DarrellWallace Jr.'], [u'47', u'AJAllmendinger'], [u'48', u'JimmieJohnson'], [u'51', u'TimmyHill'], [u'55', u'ReedSorenson'], [u'72', u'ColeWhitt'], [u'78', u'MartinTruex Jr.'], [u'88', u'AlexBowman'], [u'95', u'KaseyKahne'], [u'1', u'4', u'KevinHarvick'], [u'2', u'14', u'ClintBowyer'], [u'3', u'10', u'AricAlmirola'], [u'4', u'31', u'RyanNewman'], [u'5', u'42', u'KyleLarson'], [u'6', u'11', u'DennyHamlin'], [u'7', u'78', u'MartinTruex Jr.'], [u'8', u'20', u'ErikJones'], [u'9', u'3', u'AustinDillon'], [u'10', u'88', u'AlexBowman'], [u'11', u'1', u'JamieMcMurray'], [u'12', u'18', u'KyleBusch'], [u'13', u'41', u'KurtBusch'], [u'14', u'48', u'JimmieJohnson'], [u'15', u'9', u'ChaseElliott'], [u'16', u'37', u'ChrisBuescher'], [u'17', u'22', u'JoeyLogano'], [u'18', u'43', u'DarrellWallace Jr.'], [u'19', u'21', u'PaulMenard'], [u'20', u'2', u'BradKeselowski'], [u'21', u'19', u'DanielSuarez'], [u'22', u'32', u'MattDiBenedetto'], [u'23', u'12', u'RyanBlaney'], [u'24', u'13', u'TyDillon'], [u'25', u'17', u'RickyStenhouse Jr.'], [u'26', u'24', u'WilliamByron'], [u'27', u'47', u'AJAllmendinger'], [u'28', u'6', u'TrevorBayne'], [u'29', u'34', u'MichaelMcDowell'], [u'30', u'38', u'DavidRagan'], [u'31', u'95', u'KaseyKahne'], [u'32', u'15', u'RossChastain'], [u'33', u'72', u'ColeWhitt'], [u'34', u'00', u'JeffreyEarnhardt'], [u'35', u'51', u'TimmyHill'], [u'36', u'*55', u'ReedSorenson'], [u'37', u'23', u'GrayGaulding'], [u'1', u'78', u'MartinTruex Jr.'], [u'2', u'18', u'KyleBusch'], [u'3', u'42', u'KyleLarson'], [u'4', u'20', u'ErikJones'], [u'5', u'3', u'AustinDillon'], [u'6', u'22', u'JoeyLogano'], [u'7', u'41', u'KurtBusch'], [u'8', u'12', u'RyanBlaney'], [u'9', u'31', u'RyanNewman'], [u'10', u'4', u'KevinHarvick'], [u'11', u'2', u'BradKeselowski'], [u'12', u'37', u'ChrisBuescher'], [u'13', u'6', u'TrevorBayne'], [u'14', u'21', u'PaulMenard'], [u'15', u'1', u'JamieMcMurray'], [u'16', u'17', u'RickyStenhouse Jr.'], [u'17', u'13', u'TyDillon'], [u'18', u'32', u'MattDiBenedetto'], [u'19', u'43', u'DarrellWallace Jr.'], [u'20', u'23', u'GrayGaulding'], [u'21', u'38', u'DavidRagan'], [u'22', u'34', u'MichaelMcDowell'], [u'23', u'00', u'JeffreyEarnhardt'], [u'24', u'55', u'ReedSorenson'], [u'25', u'11', u'DennyHamlin'], [u'26', u'14', u'ClintBowyer'], [u'27', u'10', u'AricAlmirola'], [u'28', u'88', u'AlexBowman'], [u'29', u'24', u'WilliamByron'], [u'30', u'19', u'DanielSuarez'], [u'31', u'9', u'ChaseElliott'], [u'32', u'47', u'AJAllmendinger'], [u'33', u'48', u'JimmieJohnson'], [u'34', u'95', u'KaseyKahne'], [u'35', u'51', u'TimmyHill'], [u'36', u'15', u'RossChastain'], [u'37', u'72', u'ColeWhitt'], [u'1', u'78', u'MartinTruex Jr.', u'1', u'200', u'125', u'Running', u'60', u'7'], [u'2', u'42', u'KyleLarson', u'3', u'200', u'0', u'Running', u'43', u'0'], [u'3', u'18', u'KyleBusch', u'2', u'200', u'62', u'Running', u'51', u'0'], [u'4', u'2', u'BradKeselowski', u'11', u'200', u'0', u'Running', u'49', u'0'], [u'5', u'22', u'JoeyLogano', u'6', u'200', u'9', u'Running', u'45', u'0'], [u'6', u'11', u'DennyHamlin', u'25', u'200', u'1', u'Running', u'39', u'0'], [u'7', u'20', u'ErikJones', u'4', u'200', u'0', u'Running', u'39', u'0'], [u'8', u'12', u'RyanBlaney', u'8', u'200', u'0', u'Running', u'29', u'0'], [u'9', u'48', u'JimmieJohnson', u'33', u'200', u'0', u'Running', u'38', u'0'], [u'10', u'3', u'AustinDillon', u'5', u'200', u'0', u'Running', u'27', u'0'], [u'11', u'14', u'ClintBowyer', u'26', u'199', u'0', u'Running', u'30', u'0'], [u'12', u'10', u'AricAlmirola', u'27', u'199', u'0', u'Running', u'25', u'0'], [u'13', u'88', u'AlexBowman', u'28', u'199', u'0', u'Running', u'24', u'0'], [u'14', u'41', u'KurtBusch', u'7', u'199', u'0', u'Running', u'27', u'0'], [u'15', u'24', u'WilliamByron', u'29', u'199', u'1', u'Running', u'23', u'0'], [u'16', u'9', u'ChaseElliott', u'31', u'199', u'0', u'Running', u'21', u'0'], [u'17', u'1', u'JamieMcMurray', u'15', u'199', u'1', u'Running', u'20', u'0'], [u'18', u'17', u'RickyStenhouse Jr.', u'16', u'199', u'0', u'Running', u'19', u'0'], [u'19', u'21', u'PaulMenard', u'14', u'199', u'0', u'Running', u'18', u'0'], [u'20', u'43', u'DarrellWallace Jr.', u'19', u'199', u'0', u'Running', u'17', u'0'], [u'21', u'31', u'RyanNewman', u'9', u'199', u'0', u'Running', u'16', u'0'], [u'22', u'47', u'AJAllmendinger', u'32', u'199', u'0', u'Running', u'15', u'0'], [u'23', u'19', u'DanielSuarez', u'30', u'199', u'0', u'Running', u'14', u'0'], [u'24', u'95', u'KaseyKahne', u'34', u'199', u'1', u'Running', u'13', u'0'], [u'25', u'38', u'DavidRagan', u'21', u'199', u'0', u'Running', u'12', u'0'], [u'26', u'34', u'MichaelMcDowell', u'22', u'199', u'0', u'Running', u'11', u'0'], [u'27', u'13', u'TyDillon', u'17', u'198', u'0', u'Running', u'10', u'0'], [u'28', u'72', u'ColeWhitt', u'37', u'198', u'0', u'Running', u'9', u'0'], [u'29', u'15', u'RossChastain', u'36', u'198', u'0', u'Running', u'0', u'0'], [u'30', u'37', u'ChrisBuescher', u'12', u'197', u'0', u'Running', u'7', u'0'], [u'31', u'32', u'MattDiBenedetto', u'18', u'196', u'0', u'Running', u'6', u'0'], [u'32', u'23', u'GrayGaulding', u'20', u'194', u'0', u'Running', u'5', u'0'], [u'33', u'51', u'TimmyHill', u'35', u'193', u'0', u'Running', u'0', u'0'], [u'34', u'55', u'ReedSorenson', u'24', u'193', u'0', u'Running', u'3', u'0'], [u'35', u'4', u'KevinHarvick', u'10', u'191', u'0', u'Running', u'2', u'0'], [u'36', u'00', u'JeffreyEarnhardt', u'23', u'189', u'0', u'Running', u'1', u'0'], [u'37', u'6', u'TrevorBayne', u'13', u'108', u'0', u'Accident', u'1', u'0'], [u'1', u'78', u'MartinTruex Jr.', u'60'], [u'2', u'18', u'KyleBusch', u'60'], [u'3', u'22', u'JoeyLogano', u'60'], [u'4', u'2', u'BradKeselowski', u'60'], [u'5', u'48', u'JimmieJohnson', u'60'], [u'6', u'42', u'KyleLarson', u'60'], [u'7', u'41', u'KurtBusch', u'60'], [u'8', u'20', u'ErikJones', u'60'], [u'9', u'14', u'ClintBowyer', u'60'], [u'10', u'11', u'DennyHamlin', u'60'], [u'11', u'3', u'AustinDillon', u'60'], [u'12', u'1', u'JamieMcMurray', u'60'], [u'13', u'10', u'AricAlmirola', u'60'], [u'14', u'9', u'ChaseElliott', u'60'], [u'15', u'24', u'WilliamByron', u'60'], [u'16', u'19', u'DanielSuarez', u'60'], [u'17', u'21', u'PaulMenard', u'60'], [u'18', u'88', u'AlexBowman', u'60'], [u'19', u'6', u'TrevorBayne', u'60'], [u'20', u'37', u'ChrisBuescher', u'60'], [u'21', u'31', u'RyanNewman', u'60'], [u'22', u'17', u'RickyStenhouse Jr.', u'60'], [u'23', u'95', u'KaseyKahne', u'60'], [u'24', u'38', u'DavidRagan', u'59'], [u'25', u'34', u'MichaelMcDowell', u'59'], [u'26', u'43', u'DarrellWallace Jr.', u'59'], [u'27', u'32', u'MattDiBenedetto', u'59'], [u'28', u'47', u'AJAllmendinger', u'59'], [u'29', u'15', u'RossChastain', u'59'], [u'30', u'72', u'ColeWhitt', u'59'], [u'31', u'13', u'TyDillon', u'59'], [u'32', u'12', u'RyanBlaney', u'59'], [u'33', u'23', u'GrayGaulding', u'58'], [u'34', u'55', u'ReedSorenson', u'58'], [u'35', u'51', u'TimmyHill', u'58'], [u'36', u'4', u'KevinHarvick', u'57'], [u'37', u'00', u'JeffreyEarnhardt', u'56'], [u'1', u'78', u'MartinTruex Jr.', u'120'], [u'2', u'2', u'BradKeselowski', u'120'], [u'3', u'18', u'KyleBusch', u'120'], [u'4', u'11', u'DennyHamlin', u'120'], [u'5', u'20', u'ErikJones', u'120'], [u'6', u'22', u'JoeyLogano', u'120'], [u'7', u'48', u'JimmieJohnson', u'120'], [u'8', u'42', u'KyleLarson', u'120'], [u'9', u'14', u'ClintBowyer', u'120'], [u'10', u'24', u'WilliamByron', u'120'], [u'11', u'41', u'KurtBusch', u'120'], [u'12', u'10', u'AricAlmirola', u'120'], [u'13', u'31', u'RyanNewman', u'120'], [u'14', u'9', u'ChaseElliott', u'120'], [u'15', u'88', u'AlexBowman', u'120'], [u'16', u'1', u'JamieMcMurray', u'120'], [u'17', u'19', u'DanielSuarez', u'120'], [u'18', u'3', u'AustinDillon', u'120'], [u'19', u'12', u'RyanBlaney', u'120'], [u'20', u'17', u'RickyStenhouse Jr.', u'120'], [u'21', u'37', u'ChrisBuescher', u'120'], [u'22', u'95', u'KaseyKahne', u'120'], [u'23', u'38', u'DavidRagan', u'120'], [u'24', u'47', u'AJAllmendinger', u'120'], [u'25', u'43', u'DarrellWallace Jr.', u'120'], [u'26', u'34', u'MichaelMcDowell', u'120'], [u'27', u'32', u'MattDiBenedetto', u'120'], [u'28', u'15', u'RossChastain', u'119'], [u'29', u'21', u'PaulMenard', u'119'], [u'30', u'72', u'ColeWhitt', u'119'], [u'31', u'13', u'TyDillon', u'118'], [u'32', u'23', u'GrayGaulding', u'117'], [u'33', u'55', u'ReedSorenson', u'116'], [u'34', u'51', u'TimmyHill', u'115'], [u'35', u'00', u'JeffreyEarnhardt', u'114'], [u'36', u'4', u'KevinHarvick', u'113'], [u'37', u'6', u'TrevorBayne', u'108']]

Edit: to install selenium, run pip install selenium, and then install the appropriate bindings for your browser:

Chrome driver: https://sites.google.com/a/chromium.org/chromedriver/downloads

Firefox driver: https://github.com/mozilla/geckodriver/releases

Then, to run the code, create a driver object with the classname corresponding to your browser of choice, passing the path to the driver:

d = webdriver.Firefox("/path/to/driver")

or

d = webdriver.Chrome("/path/to/driver")

Edit

Writing data to csv file:

import csv
write = csv.writer(open('nascarDrivers.csv', 'w'))
write.writerows(new_data) #new_data is the list of lists containing the table data
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • Thanks!! Wow... not sure how i'd have ever figured that out! I'll give it a shot once i can get Selenium installed. (having troubles, but finding answers within this site) – sbiondio Mar 21 '18 at 19:38
  • @sbiondio Glad to help! Please see my recent edit, as I provided instructions to install `selenium`. – Ajax1234 Mar 21 '18 at 19:41
  • You are fast!! Thank you again!! i'm running into the error of where selenium is installed, but still getting the "ModuleNotFoundError: No module named 'selenium'". Went into the python library folder and everything for selenium is there. Odd. – sbiondio Mar 21 '18 at 19:55
  • @sbiondio ensure that you are running the correct python version for the pip installation. For instance, if you ran `pip install selenium`, then that is for python2 only. Therefore, you would have to run `python filename.py`. If using Python3, run `pip3 install selenium`, and then the file as `python3 filename.py`. If that does not work, try `python -m pip install --upgrade selenium` for Python2, or `python3 -m pip install --upgrade selenium` for Python3. – Ajax1234 Mar 21 '18 at 20:01
  • AWESOME!! It took a bit of finagling, but i got it to work and not error. My issue now comes with getting the output. Nothing outputs from it when i run in the console, and when i try to .write to a file, it's not reading the classes, or outputting from the scrape. Is that built into what you have up there, or something else needed? I'm searching in the mean time, but i want to understand your code as well. and of course, Thanks Again!!! – sbiondio Mar 22 '18 at 14:45
  • @sbiondio Glad you have it installed! Are you able to open the webpage itself with Selenium? Also, if possible, could you post your new code with `selenium`? That will help with debugging. – Ajax1234 Mar 22 '18 at 14:48
  • from bs4 import BeautifulSoup as soup import re import urllib from selenium import webdriver chromedriver = '/Library/Frameworks/Python.framework/Versions/3.6/selenium/webdriver/chromedriver' d = webdriver.Chrome(chromedriver) classes = ['position', 'chase', 'car-number', 'driver', 'manufacturer', 'start-position not-mobile', 'laps not-mobile', 'laps-led not-mobile', 'final-status', 'points not-mobile', 'bonus not-mobile'] d.get('https://www.nascar.com/results/race_center/2018/monster-energy-nascar-cup-series/auto-club-400/stn/race/') Ugh, sorry, i don't know how to code format – sbiondio Mar 22 '18 at 15:04
  • @sbiondio You do not need to save the chrome driver in the Python library. Perhaps move it to a different folder, such as downloads? That may be the issue. Also, the is the webpage loading when you run the code? – Ajax1234 Mar 22 '18 at 15:06
  • Sorry, yes, the page is popping up and loading when the script is run. I ran the Chrome driver from the downloads folder this time as well. How should this be outputted? if i try to add a .write function, nothing comes out, `filename = 'nascarDrivers.csv' f = open(filename, "w") f.write(driver)` Chances are, i'm doing something wrong, but this is where my search has lead me so far. – sbiondio Mar 22 '18 at 15:16
  • @sbiondio Instead of writing the driver object, write the data accumulated from running the script. Please see my recent edit, as I added how to write the rows to the csv file. – Ajax1234 Mar 22 '18 at 15:18
  • YESS!!! that did it!! it's a little sloppy, and i think it's the results from the race (a different div), but this gives me a good base on how to play with it, and i can prob figure the rest... can't thank you enough, you're a magician! Thank you! – sbiondio Mar 22 '18 at 17:06
  • @sbiondio Glad to help! – Ajax1234 Mar 22 '18 at 17:07
1

If you want the text from every table-row you can do this:

import bs4
import requests

res = requests.get('https://www.nascar.com/results/race_center/2018/monster-energy-nascar-cup-series/auto-club-400/stn/race/')

soup = bs4.BeautifulSoup(res.text, 'lxml')
tds = soup.find_all('div', class_='table-row')
for td in tds:
    print(td.text)
paul41
  • 576
  • 7
  • 17
  • Thank you!! 2 things 1) is it just to be assumed that "td" for the rows is always there? 2) when i adjusted the code you sent, i still didn't get a return. Did it work for you and i'm missing something? – sbiondio Mar 21 '18 at 18:14
  • Not sure how i can paste a screen shot of what it looks like inside the .table-row that might help? – sbiondio Mar 21 '18 at 18:22
  • As @Ajax1234 noticed the page is being generated dynamically and that is why the code I used isn't working. Check out Ajax1234's answer and it should fix it. – paul41 Mar 21 '18 at 18:53