I have a unique situation while trying to scrape a website. I'm searching hundreds of names through the search bar and then scraping tables. however, some names are unique and are spelled differently on the my list compared to the site. in such cases, I looked up a couple names on the site manually it still takes me directly to the individual page. other times, it goes to the list of names if there are multiple guys with same or similar names (in that case, i want the person that played in the nba. i've already accounted for this, but i think it's necessary to mention). how do i go about still going into those players' individual pages instead of having to run the script every time and hit the error to see which player has a slightly different spelling? again, the name in the array will directly take you to the individual page even if spelled slightly different or a list of name (need the one in NBA). Some examples are Georgios Papagiannis (listed as George Papagiannis on website), Ognjen Kuzmic (listed as Ognen Kuzmic), Nene (listed as Maybyner Nene but will take you to a list of name -- https://basketball.realgm.com/search?q=nene). this seems pretty tough, but i feel like it might be possible. also, it seems like rather than writing all the scraped data on to the csv, it gets overwritten each time with the next player. thanks a ton.
the error I get:
AttributeError: 'NoneType' object has no attribute 'text'
import requests
from bs4 import BeautifulSoup
import pandas as pd
playernames=['Carlos Delfino', 'Nene', 'Yao Ming', 'Marcus Vinicius', 'Raul Neto', 'Timothe Luwawu-Cabarrot']
result = pd.DataFrame()
for name in playernames:
fname=name.split(" ")[0]
lname=name.split(" ")[1]
url="https://basketball.realgm.com/search?q={}+{}".format(fname,lname)
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
if soup.find('a',text=name).text==name:
url="https://basketball.realgm.com"+soup.find('a',text=name)['href']
print(url)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
try:
table1 = soup.find('h2',text='International Regular Season Stats - Per Game').findNext('table')
table2 = soup.find('h2',text='International Regular Season Stats - Advanced Stats').findNext('table')
df1 = pd.read_html(str(table1))[0]
df2 = pd.read_html(str(table2))[0]
commonCols = list(set(df1.columns) & set(df2.columns))
df = df1.merge(df2, how='left', on=commonCols)
df['Player'] = name
print(df)
except:
print ('No international table for %s.' %name)
df = pd.DataFrame([name], columns=['Player'])
result = result.append(df, sort=False).reset_index(drop=True)
cols = list(result.columns)
cols = [cols[-1]] + cols[:-1]
result = result[cols]
result.to_csv('international players.csv', index=False)