0

i've been trying to scrape this site

    import pandas as pd
    import requests
    from bs4 import BeautifulSoup
    r = requests.get("https://www.nbcsports.com/edge/basketball/nba/injury-report")
    soup = BeautifulSoup(r.content,"lxml")
st1 = soup.find("div", attrs={"class":"page-wrapper--sidebar page-wrapper--sidebar-initial container clearfix page-wrapper"})
st2 = st1.find("div",attrs={"class":"content content--main cols-8"})
st3 = st2.find("div", attrs={"class":"block__content"})
st4 = st3.find("div",attrs={"id":"injury-report-page-wrapper"})
st4.find("div",attrs={"class":"injury-report-wall"})

Nothing returns.

I am trying to get the injury data however it doesn't work at all. i've tried bs,pandas couldn't make it. it looks like this data comes from an api but kinda stuckt. Open for advices.

zinT0
  • 3
  • 1
  • 2
    Then, you are dealing with JavaScript website where the content is dynamically loaded once the page fully loaded. BS4 nor Pandas will not be able to render the content for you! chck the XHR requests or use selenium – αԋɱҽԃ αмєяιcαη Apr 20 '21 at 23:23

1 Answers1

1
import requests
import pandas as pd


def main(url):
    params = {
        "sort": "-start_date",
        "filter[player.team.meta.drupal_internal__id]": 176,
        "filter[player.status.active]": 1,
        "filter[active]": 1,
        "include": "injury_type,player,player.status,player.position"
    }
    r = requests.get(url, params=params)
    data = []
    for item in r.json()['included']:
        data.append(item['attributes'])
    df = pd.DataFrame().from_dict(data)
    print(df)
    # df.to_csv('data.csv', index=False)


main('https://www.nbcsports.com/edge/api/injury')
  • 1
    great answer as usual. I just wanted to point this out as it grabbed my interest. When I construct a list of dictionaries like you did here, I usually go with simply `pd.DataFrame(data)` as opposed to the `.from_dict()`. I looked a bit around what's the difference and found a nice explanation [here](https://stackoverflow.com/questions/20638006/convert-list-of-dictionaries-to-a-pandas-dataframe) – chitown88 Apr 21 '21 at 07:42
  • 1
    @chitown88 Thank you :P the reason behind using `pd.dataframe().from_dict` is actually helpful if you not sure if the data is nested or not. I've used to use that since I've been a Pandas core contributor before :P – αԋɱҽԃ αмєяιcαη Apr 21 '21 at 14:29
  • So if it's nested, does it flatten it out? – chitown88 Apr 21 '21 at 14:31
  • 1
    @chitown88 it's will treat the kes/values flatten out. you might be noticed that i used a list of dict but i called it as `.from_dict(list)` so the constructor will treat it as nested if it's [check](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.from_dict.html#pandas-dataframe-from-dict) _Construct DataFrame from dict of array-like or dicts._ – αԋɱҽԃ αмєяιcαη Apr 21 '21 at 14:32
  • ah yes! Ok I see now what you mean. Awesome! – chitown88 Apr 21 '21 at 14:36
  • @chitown88 you welcome. Just for future records. you can use (pd.DataFrame(data)) [https://github.com/pandas-dev/pandas/blob/v1.2.4/pandas/core/frame.py#L394-L9477] directly which is automatically guessing the structure. unless you know the type of the data so you can use the direct constructor. – αԋɱҽԃ αмєяιcαη Apr 21 '21 at 14:48