Web Scraping a Dynamic website that uses javasript with beautiful soup and RegEx

Question

I am trying to make an app that gives fantasy football scores for the XFL as a personal project. I was able to use beautiful soup to get the source and String.split() to separate all the stats of the players in But when I try to get the rosters I get something like this:

>**1**</fagtd><td style="background-color:white; border-bottom:1px solid black; border-left:none; border-right:1px solid black; border-top:none; text-align:center; vertical-align:bottom; white-space:nowrap; width:89px">**Jazz**</td><td style="background-color:white; border-bottom:1px solid black; border-left:none; border-right:1px solid black; border-top:none; text-align:center; vertical-align:bottom; white-space:nowrap; width:100px">**Ferguson**</td><td style="background-color:white; border-bottom:1px solid black; border-left:none; border-right:1px solid black; border-top:none; text-align:center; vertical-align:bottom; white-space:nowrap; width:61px">**WR**

and out of this I need to get the information 1 Jazz Ferguson and WR. String.split() will not work for something this complex. I was thinking about using regular expressions but I am not sure how. Can any one come up with a reg ex for this or if there is a much easier way point me in the right direction? Thank you.

EDIT This is the portion of the code I use to get that HTML data above. It prints out the whole thing that part above is only a section.

session = HTMLSession()
page = session.get('https://www.xfl.com/en-US/teams/dallas/renegades-articles/dallas-renegades-roster')

soup2 = BeautifulSoup(page.content, PARSER)
script = soup2.find_all('script')

for tags in script:

    if ((tags.text.find('"title":"Dallas Renegades roster"')) >= 0):

        rosterData = tags.text[(tags.text.find('College')):]
        rosterData = rosterData.replace('</td>', '').replace('\\','')

        print(rosterData)

Instead of using regex, maybe consider an html parser? Here's one for Python: https://docs.python.org/3/library/html.parser.html — Josh Noe, Feb 27 '20 at 19:03
Have you done **any** research? (see: https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users) Stack Overflow is not a substitute for guides, tutorials or documentation. — AMC, Feb 27 '20 at 19:19
Does this answer your question? [Get page generated with Javascript in Python](https://stackoverflow.com/questions/8960288/get-page-generated-with-javascript-in-python) — AMC, Feb 27 '20 at 19:19
@AMC yes I have! I will try to do some more but most of what I found has been very confusing. I will continue and look into selenium more. If you have any more suggested reading send it my way! — Mike Grim, Feb 27 '20 at 19:45

score 2 · Answer 1 · answered Feb 28 '20 at 04:48

Hi below code gets the full table as a dataframe you can filter the required data from this:-

import requests
import pandas as pd
url = 'https://www.xfl.com/en-US/teams/dallas/renegades-articles/dallas-renegades-roster'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)

Web Scraping a Dynamic website that uses javasript with beautiful soup and RegEx

1 Answers1