Python table scrape returning no data

Question

This seems similar to my previous post (i'll link at the bottom), but this is a different url and it uses tables. when i run the following code, i can get all of the data within that extracted:

import requests

from bs4 import BeautifulSoup

url = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
r = requests.get(url)


soup = BeautifulSoup(r.text, "lxml")

try:
     data = soup.find('div', class_='div-col1')
     print(data)

except:
     print("You Get Nothing!")

I then change up the try to

try:
     data = soup.find_all('td', class_='car')
     print(data)

except:
     print("You Get Nothing!")

and I am only getting the info pulled from the thead and not the tbody

Is there something i'm missing, or doing wrong? The further in i try to nail down, i either error out, or just get a return of empty [ ]

Also, this webpage is Dynamic, and i tried what was given to me in my previous thread Old Post, and i understand the layout and coding between the 2 pages is different, but my concern with that is that loading Chrome every time I run the script will be a lot since it will probably need tp be refreshed every 30sec-1min 300-400 times.

score 2 · Answer 1 · answered Mar 26 '18 at 16:09

2

why don't you just go directly with the source, if you see the page source of the link it is getting data from https://www.nascar.com/live/feeds/live-feed.json, with that you can easily get the data in json format and parse it as you like.

import requests
import json

url = "https://www.nascar.com/live/feeds/live-feed.json"
res = requests.get(url)
print(r.json())

answered Mar 26 '18 at 16:09

johnII

1,423
1
14
20

I should have mentioned that i'm very new at this, i didn't know i could do that, but this helps as well. Thank you!! – sbiondio Mar 26 '18 at 16:50
@johnll, this is the perfect solution for the question. But, I guess it'll help the OP to understand a bit more if you showed how to use the JSON and print something, like, all the names. Also, remove the `import json` line, it is not needed for `response.json()` and may confuse others. – Keyur Potdar Mar 26 '18 at 17:18
@sbiondio, as you said, the page is updating the data continuously (about every 5 secs to be precise) by fetching the data from the link johnll has shown. You can get all the table items from this JSON. Also, `requests.json()` is way faster than any other approach that uses `bs4`. – Keyur Potdar Mar 26 '18 at 17:23
@KeyurPotdar Thank you for the clarification, this helps a lot!!! I'm playing around with what this it outputting now! – sbiondio Mar 26 '18 at 17:29
@sbiondio, have a look at [this question](https://stackoverflow.com/questions/16675849/python-parsing-json-data-set). Maybe it'll help you to understand it better. (Just remember that you don't have to use the seperate `json` module while using `requests` which has its own built-in `response.json()` parser). – Keyur Potdar Mar 26 '18 at 17:33

score 0 · Answer 2 · answered Mar 26 '18 at 16:32

0

The data you wish to fetch from that page gets generated dynamically so when you make a http request using requests library, it can't handle that. However, you can try with new library from the same author requests-html. It is capable of handling dynamically generated content. This is how you can go with this new library:

import requests_html

URL = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"

with requests_html.HTMLSession() as session:
    r = session.get(URL)
    r.html.render(sleep=5)
    for items in r.html.find('#pqrStatistic tr'):
        data = [item.text for item in items.find("th,td")]
        print(data)

Partial results:

['pos', 'car', 'driver', 'manuf', 'delta', 'laps', 'last lap', 'best time', 'best speed', 'best lap']
['1', '54', 'Kyle Benjamin(i)', '', '--', '161', '36.474', '20.198', '93.752', '8']
['2', '98', 'Grant Enfinger', '', '0.761', '161', '36.402', '20.144', '94.003', '157']
['3', '4', 'Todd Gilliland #', '', '1.407', '161', '36.359', '20.142', '94.013', '158']
['4', '8', 'John H. Nemechek(i)', '', '2.177', '161', '36.304', '20.234', '93.585', '31']
['5', '16', 'Brett Moffitt', '', '3.268', '161', '36.145', '20.359', '93.010', '8']

answered Mar 26 '18 at 16:32

SIM

21,997
5
37
109

This may be just what i'm looking for! But when I try to run it, i get all kinds of errors. I installed requests_html, but the slew of errors were: Traceback (most recent call last): File "/Users/salbiondio4/Documents/App Creation/PythonScripts/NASCAR/livefeed.py", line 68, in r.html.render(sleep=5) started with that... it probably doesn't help, but i'll do some digging – sbiondio Mar 26 '18 at 16:48
It requires python 3.6. – SIM Mar 26 '18 at 16:48
thought that might be the problem, but I'm running in PyCharm with python 3.6.2. Tried in terminal with python3, same errors. the start of it looks like it's trying to download chromium?? "[W:pyppeteer.chromium_downloader] start chromium download. Download may take a few minutes. Traceback (most recent call last):" – sbiondio Mar 26 '18 at 16:53
Yes, it downloads chromium in the first run. However, in the second or third run (when you experiment for the first time), It should work. Did it fetch you the data along with errors or only the errors you have got so far? – SIM Mar 26 '18 at 17:00
I have only gotten errors, no data. Could it be I always have Chromium install from my previous project? (just trying to come up with thoughts to help) – sbiondio Mar 26 '18 at 17:09

Python table scrape returning no data

2 Answers2