I'm trying to scrape data from this site: https://www.koreabaseball.com/Record/Team/Hitter/Basic1.aspx
The default year has been set as 2018 (the most recent year) by the website and I want to scrape all available years.
A very similar question has been asked 4 years ago but it doesn't seem to work.
scraping a response from a selected option in dropdown list
All it does for me when I run it is print out the table from the default year regardless of parameter I assign.
I can't access different years via url since url doesn't change when I select options in the drop down box. So I tried using webdriver and xpath.
Here is my code that I attempted:
url = "https://www.koreabaseball.com/Record/Team/Hitter/Basic1.aspx"
driver = webdriver.Chrome("/Applications/chromedriver")
driver.get(url)
year = 2017
driver.find_element_by_xpath("//select[@name='ctl00$ctl00$ctl00$cphContents$cphContents$cphContents$ddlSeason$ddlSeason']/option[@value='"+str(year)+"']").click()
page = driver.page_source
bs_obj = BSoup(page, 'html.parser')
header_row = bs_obj.find_all('table')[0].find('thead').find('tr').find_all('th')
body_rows = bs_obj.find_all('table')[0].find('tbody').find_all('tr')
footer_row = bs_obj.find_all('table')[0].find('tfoot').find('tr').find_all('td')
headings = []
footings = []
for heading in header_row:
headings.append(heading.get_text())
for footing in footer_row:
footings.append(footing.get_text())
body = []
for row in body_rows:
cells = row.find_all('td')
row_temp = []
for i in range(len(cells)):
row_temp.append(cells[i].get_text())
body.append(row_temp)
driver.quit()
print(headings)
print(body)
print(footings)
I expected the output to print out the table from the year 2017 as I specified but the actual output prints out the table from the year 2018 (the default year). Can anyone give me ideas to solve this problem?
Edit: I just found out that what I see by doing "Inspect" is different from what I get from "Page Source". Specifically, page source still has "2018" as the Select option (which is not what I want), whereas Inspect shows me "2017" is selected. But still stuck on how to use "Inspect" rather than page source.