When I try to use "read_html" to get the values from a table on Stathead.com, the first 10 rows are output as NaN, and the rest of the data is output normally. I've tried a bunch of different things but can't get it to work. (I do have a paid Stathead subscription, so maybe it has something to do with that?)
import pandas as pd
import requests
url = 'https://stathead.com/basketball/player-game-finder.cgi?request=1&player_game_min=1&team_game_min=1&comp_type=reg&order_by=pts&match=player_game&season_start=1&player_game_max=9999&year_max=2023&team_game_max=84&season_end=-1&order_by_asc=0&positions%5B%5D=G&positions%5B%5D=GF&comp_id=NBA&year_min=2023&cstat%5B1%5D=mp&ccomp%5B1%5D=gt&cval%5B1%5D=1&offset=0'
page = requests.get(url)
dfs = pd.read_html(page.text)
print(dfs)
I was expecting there to be actual values in all cells because the actual table has values if you go to the Stathead URL
EDIT
Okay, I tried the URL in a different browser where I wasn't logged into Stathead, and here's what I see: Not logged in table
So the issue is almost definitely not being logged in. Is there a way to show that I'm logged in when using read_html, or something else I could do?
EDIT 2
Thank you for the suggestions @Driftr95 !! I tried both for the last 2 hours and couldn't figure it out... here was my first try using the request module [I put "MyUsername" and "MyPassword" in the code below, but in my real code I put my actual user name and password (: ]
url_to_open = 'https://stathead.com/basketball/player-game-finder.cgi?request=1&player_game_min=1&team_game_min=1&comp_type=reg&order_by=pts&match=player_game&season_start=1&player_game_max=9999&year_max=2023&team_game_max=84&season_end=-1&order_by_asc=0&positions[]=G&positions[]=GF&comp_id=NBA&year_min=2023&cstat[1]=mp&ccomp[1]=gt&cval[1]=1&offset=0'
# Fill in your details here to be posted to the login form.
payload = {
'username': 'MyUsername',
'password': 'MyPassword'
}
# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
p = s.post('https://stathead.com/users/login.cgi', data=payload)
page = requests.get(url_to_open)
dfs = pd.read_html(page.text)
print(dfs)
And here was my second try using convert curl:
import requests
cookies = {
'srcssfull': 'yes',
'_gid': 'GA1.2.550164619.1676068467',
'_gcl_au': '1.1.935969025.1676068468',
'ln_or': 'eyIzNTM4NTk2IjoiZCJ9',
'hubspotutk': 'e35045f7f0a6c7771ecfd1aa0e8d4276',
'_fbp': 'fb.1.1676068478646.152421943',
'__hssrc': '1',
'__hstc': '205977932.e35045f7f0a6c7771ecfd1aa0e8d4276.1676068478416.1676086198834.1676088324184.4',
'_gat_gtag_UA_1890630_24': '1',
'_gat_gtag_UA_1890630_9': '1',
'csrf_token': 'cff1c9cb5b8fdc60fad64d9fae3494fd',
'_ga': 'GA1.2.1796201960.1676068467',
'__hssc': '205977932.15.1676088324184',
'_ga_2M1H4N076C': 'GS1.1.1676088445.4.1.1676091530.0.0.0',
}
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
# 'Cookie': 'srcssfull=yes; _gid=GA1.2.550164619.1676068467; _gcl_au=1.1.935969025.1676068468; ln_or=eyIzNTM4NTk2IjoiZCJ9; hubspotutk=e35045f7f0a6c7771ecfd1aa0e8d4276; _fbp=fb.1.1676068478646.152421943; __hssrc=1; __hstc=205977932.e35045f7f0a6c7771ecfd1aa0e8d4276.1676068478416.1676086198834.1676088324184.4; _gat_gtag_UA_1890630_24=1; _gat_gtag_UA_1890630_9=1; csrf_token=cff1c9cb5b8fdc60fad64d9fae3494fd; _ga=GA1.2.1796201960.1676068467; __hssc=205977932.15.1676088324184; _ga_2M1H4N076C=GS1.1.1676088445.4.1.1676091530.0.0.0',
'Origin': 'https://stathead.com',
'Referer': 'https://stathead.com/users/login.cgi',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
'sec-ch-ua': '"Not_A Brand";v="99", "Google Chrome";v="109", "Chromium";v="109"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
}
data = {
'username': 'MyUsername',
'password': 'MyPassword',
'remember': '1',
'referrer': 'https%3A%2F%2Fstathead.com%2Fprofile%2F',
'token': '0',
'csrf_token': '612ccbc4f75b46114af2f23a618c9668',
}
response = requests.post('https://stathead.com/users/login.cgi', cookies=cookies, headers=headers, data=data)
page = requests.get(url_to_open)
dfs = pd.read_html(page.text)
print(dfs)