but unfortunately it only works for the latest season option (2018/2019)
Website is using JavaScript to load the old table(1992-2017), so when you use Python to access that you gain latest table. If you desire to scrape the table filter by year/session, i provide a hard code version for you(Because i did not found the rule of year number). But you want to complete it more elegantly, selenium or requests_html might suit for you.
Note: Im imitating JavaScript gain data from server, so the response's content is json type. And it can only gain different year's Premier League table. Filter by competition/matchweek/home_or_away is not available in my example. If you want to add those option into script, you should analysis the rule of url parameter(use the way @pguardiario said or use some tools like fiddler).
import requests
from pprint import pprint
years = {str(1991+i):str(i) for i in range(1,23)}
years.update({
"2018":"210",
"2017":"79",
"2016":"54",
"2015":"42",
"2014":"27"
})
specific = years.get("2017")
param = {
"altIds":"true",
"compSeasons":specific,
"detail":2,
"FOOTBALL_COMPETITION":1
}
headers = {
"Origin": "https://www.premierleague.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
"Referer": "https://www.premierleague.com/tables?co=1&se={}&ha=-1".format(specific),
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8"
}
page = requests.get('https://footballapi.pulselive.com/football/standings',
params=param,
headers=headers
)
print(page.url)
pprint(page.json())
How to get different tables from one page
I feel your question title is different from you description. If it is true, The other issue is you combine all table into one. And you should be care of //
What is meaning of .// in XPath?.
Note: If you want to get old data of Premier League table, use my code in 1st part. Because those data can only be gotten from that way.
from lxml import html
import requests
from pprint import pprint
years = {str(1991+i):str(i) for i in range(1,23)}
years.update({
"2018":"210",
"2017":"79",
"2016":"54",
"2015":"42",
"2014":"27"
})
param = {
"co":"1",
"se":years.get("2017"),
"ha":"-1"
}
page = requests.get('https://www.premierleague.com/tables', params=param)
tree = html.fromstring( page.content )
tables = tree.xpath('//tbody[contains(@class,"tableBodyContainer")]')
each_table_team_rows = [table.xpath('tr[@data-filtered-table-row-name]') for table in tables]
team_names = [[i.attrib['data-filtered-table-row-name'] for i in team_rows] for team_rows in each_table_team_rows]
pprint(team_names)