I am trying to learn how to scrape data from a webpage in python and am running into trouble with how to structure my nested loops in python. I received some assistance in how I was scraping with this question (How to pull links from within an 'a' tag). I am trying to have that code essentially iterate through different weeks (and eventually years) of webpages. What I have currently is below, but it is not iterating through the two weeks I would like it to and saving it off.
import requests, re, json
from bs4 import BeautifulSoup
weeks=['1','2']
data = pd.DataFrame(columns=['Teams','Link'])
scripts_head = soup.find('head').find_all('script')
all_links = {}
for i in weeks:
r = requests.get(r'https://www.espn.com/college-football/scoreboard/_/year/2018/seasontype/2/week/'+i)
soup = BeautifulSoup(r.text, 'html.parser')
for script in scripts_head:
if 'window.espn.scoreboardData' in script.text:
json_scoreboard = json.loads(re.search(r'({.*?});', script.text).group(1))
for event in json_scoreboard['events']:
name = event['name']
for link in event['links']:
if link['text'] == 'Gamecast':
gamecast = link['href']
all_links[name] = gamecast
#Save data to dataframe
data2=pd.DataFrame(list(all_links.items()),columns=['Teams','Link'])
#Append new data to existing data
data=data.append(data2,ignore_index = True)
#Save dataframe with all links to csv for future use
data.to_csv(r'game_id_data.csv')
Edit: So to add some clarification, it is creating duplicates of the data from one week and repeatedly appending it to the end. I also edited the code to include the proper libraries, it should be able to be copy and pasted and run in python.