This page uses JavaScript
to add data but BeautifulSoup
can't run JavaScript
.
You can use Selenium to control web browser which can run JavaScript
Or you can check in DevTools
in Firefox
/Chrome
(tab: Network
) what url is used by JavaScript
to get data from server and use it with urllib
to get these data.
I choose this method (manually searching in DevTools
).
I found that JavaScript
gets these data in JSON
format from
https://fantasy.premierleague.com/api/bootstrap-static/
Because I get data in JSON
so I can convert to Python list/dictionary using module json
and I don't need BeautifulSoup
.
It needs more manual work to recognize structure of data but it gives more data then table on page.
Here all data about first player on the list Alisson
chance_of_playing_next_round = 100
chance_of_playing_this_round = 100
code = 116535
cost_change_event = 0
cost_change_event_fall = 0
cost_change_start = 2
cost_change_start_fall = -2
dreamteam_count = 1
element_type = 1
ep_next = 11.0
ep_this = 11.0
event_points = 10
first_name = Alisson
form = 10.0
id = 189
in_dreamteam = False
news =
news_added = 2020-03-06T14:00:17.901193Z
now_cost = 62
photo = 116535.jpg
points_per_game = 4.7
second_name = Ramses Becker
selected_by_percent = 9.2
special = False
squad_number = None
status = a
team = 10
team_code = 14
total_points = 99
transfers_in = 767780
transfers_in_event = 9339
transfers_out = 2033680
transfers_out_event = 2757
value_form = 1.6
value_season = 16.0
web_name = Alisson
minutes = 1823
goals_scored = 0
assists = 1
clean_sheets = 11
goals_conceded = 12
own_goals = 0
penalties_saved = 0
penalties_missed = 0
yellow_cards = 0
red_cards = 1
saves = 48
bonus = 9
bps = 439
influence = 406.2
creativity = 10.0
threat = 0.0
ict_index = 41.7
influence_rank = 135
influence_rank_type = 18
creativity_rank = 411
creativity_rank_type = 8
threat_rank = 630
threat_rank_type = 71
ict_index_rank = 294
ict_index_rank_type = 18
There are also information about teams, etc.
Code:
from urllib.request import urlopen
import json
#url = 'https://fantasy.premierleague.com/player-list'
url = 'https://fantasy.premierleague.com/api/bootstrap-static/'
text = urlopen(url).read().decode()
data = json.loads(text)
print('\n--- element type ---\n')
#print(data['element_types'][0])
for item in data['element_types']:
print(item['id'], item['plural_name'])
print('\n--- Goalkeepers ---\n')
number = 0
for item in data['elements']:
if item['element_type'] == 1: # Goalkeepers
number += 1
print('---', number, '---')
print('type :', data['element_types'][item['element_type']-1]['plural_name'])
print('first_name :', item['first_name'])
print('second_name :', item['second_name'])
print('total_points:', item['total_points'])
print('team :', data['teams'][item['team']-1]['name'])
print('cost :', item['now_cost']/10)
if item['first_name'] == 'Alisson':
for key, value in item.items():
print(' ', key, '=',value)
Result:
--- element type ---
1 Goalkeepers
2 Defenders
3 Midfielders
4 Forwards
--- Goalkeepers ---
--- 1 ---
type : Goalkeepers
first_name : Bernd
second_name : Leno
total_points: 114
team : Arsenal
cost : 5.0
--- 2 ---
type : Goalkeepers
first_name : Emiliano
second_name : Martínez
total_points: 1
team : Arsenal
cost : 4.2
--- 3 ---
type : Goalkeepers
first_name : Ørjan
second_name : Nyland
total_points: 11
team : Aston Villa
cost : 4.3
--- 4 ---
type : Goalkeepers
first_name : Tom
second_name : Heaton
total_points: 59
team : Aston Villa
cost : 4.3
Code gives data in different order then table but if you put it all in list or better in pandas DataFrame then you can sort it in different orders.
EDIT:
You can use pandas
to get data from JSON
from urllib.request import urlopen
import json
import pandas as pd
#url = 'https://fantasy.premierleague.com/player-list'
url = 'https://fantasy.premierleague.com/api/bootstrap-static/'
# read data from url and convert to Python's list/dictionary
text = urlopen(url).read().decode()
data = json.loads(text)
# create DataFrames
players = pd.DataFrame.from_dict(data['elements'])
teams = pd.DataFrame.from_dict(data['teams'])
# divide by 10 to get `6.2` instead of `62`
players['now_cost'] = players['now_cost'] / 10
# convert team's number to its name
players['team'] = players['team'].apply(lambda x: teams.iloc[x-1]['name'])
# filter players
goalkeepers = players[ players['element_type'] == 1 ]
defenders = players[ players['element_type'] == 2 ]
# etc.
# some informations
print('\n--- goalkeepers columns ---\n')
print(goalkeepers.columns)
print('\n--- goalkeepers sorted by name ---\n')
sorted_data = goalkeepers.sort_values(['first_name'])
print(sorted_data[['first_name', 'team', 'now_cost']].head())
print('\n--- goalkeepers sorted by cost ---\n')
sorted_data = goalkeepers.sort_values(['now_cost'], ascending=False)
print(sorted_data[['first_name', 'team', 'now_cost']].head())
print('\n--- teams columns ---\n')
print(teams.columns)
print('\n--- teams ---\n')
print(teams['name'].head())
# etc.
Results
--- goalkeepers columns ---
Index(['chance_of_playing_next_round', 'chance_of_playing_this_round', 'code',
'cost_change_event', 'cost_change_event_fall', 'cost_change_start',
'cost_change_start_fall', 'dreamteam_count', 'element_type', 'ep_next',
'ep_this', 'event_points', 'first_name', 'form', 'id', 'in_dreamteam',
'news', 'news_added', 'now_cost', 'photo', 'points_per_game',
'second_name', 'selected_by_percent', 'special', 'squad_number',
'status', 'team', 'team_code', 'total_points', 'transfers_in',
'transfers_in_event', 'transfers_out', 'transfers_out_event',
'value_form', 'value_season', 'web_name', 'minutes', 'goals_scored',
'assists', 'clean_sheets', 'goals_conceded', 'own_goals',
'penalties_saved', 'penalties_missed', 'yellow_cards', 'red_cards',
'saves', 'bonus', 'bps', 'influence', 'creativity', 'threat',
'ict_index', 'influence_rank', 'influence_rank_type', 'creativity_rank',
'creativity_rank_type', 'threat_rank', 'threat_rank_type',
'ict_index_rank', 'ict_index_rank_type'],
dtype='object')
--- goalkeepers sorted by name ---
first_name team now_cost
94 Aaron Bournemouth 4.5
305 Adrián Liverpool 4.0
485 Alex Southampton 4.5
533 Alfie Spurs 4.0
291 Alisson Liverpool 6.2
--- goalkeepers sorted by cost ---
first_name team now_cost
291 Alisson Liverpool 6.2
323 Ederson Man City 6.0
263 Kasper Leicester 5.4
169 Kepa Chelsea 5.4
515 Hugo Spurs 5.3
--- teams columns ---
Index(['code', 'draw', 'form', 'id', 'loss', 'name', 'played', 'points',
'position', 'short_name', 'strength', 'team_division', 'unavailable',
'win', 'strength_overall_home', 'strength_overall_away',
'strength_attack_home', 'strength_attack_away', 'strength_defence_home',
'strength_defence_away', 'pulse_id'],
dtype='object')
--- teams ---
0 Arsenal
1 Aston Villa
2 Bournemouth
3 Brighton
4 Burnley
Name: name, dtype: object