-1

I am using Python 3.6 Selenium with Chrome web driver

Trying to scrape data from nhl stats. I can get the data but I am looking to format it to mirror the website's table in excel. I can scrape the data but only able to get one column of the entire data collected.

Below is my code -

#Load Lib
import csv
from selenium import webdriver

#Browser load and player stats
driver = webdriver.Chrome(executable_path=r"ENTER PATH")
driver.get("http://www.nhl.com/stats/player?aggregate=0&gameType=2&report=skatersummary&pos=S&reportType=season&seasonFrom=20162017&seasonTo=20162017&filter=gamesPlayed,gte,1&sort=points,goals,assists")
PlayerStats = driver.find_elements_by_class_name("rt-tr-group")
for post in PlayerStats:
        print(post.text)

driver.close()

output
1
Connor McDavid
2016-17
EDM
C
82
30
70
100
27
26
1.22
3
27
1
2
6
1
251
11.9
21:07
24.37
43.22
Sidney Crosby
2016-17
PIT
C
75
44
45
89
17
24
1.19
14
25
0
0
5
1
255
17.3
19:52
24.69
48.23

Bodom
  • 3
  • 4
  • Can you show an example of the output you get? `find_elements_by_class_name` returns a list. – cddt Aug 29 '17 at 01:43

3 Answers3

0

There are newlines in the scraped data, you can replace newlines with something else e.g. tabs

for post in PlayerStats:
    print(post.text.replace('\n', '\t'))
Amit
  • 19,780
  • 6
  • 46
  • 54
  • When I replace it with tab or comma it now prints out on one row in excel. I would like to be able to replicate the table into excel. – Bodom Aug 29 '17 at 03:16
0

You just need to split by the newline character.

print(post.text.split('\n')) # this is only print. How to split and save a list of lists, I will leave it as an exercise for you.

Output:

['1', 'Connor McDavid', '2016-17', 'EDM', 'C', '82', '30', '70', '100', '27', '26', '1.22', '3', '27', '1', '2', '6', '1', '251', '11.9', '21:07', '24.37', '43.2']
['2', 'Sidney Crosby', '2016-17', 'PIT', 'C', '75', '44', '45', '89', '17', '24', '1.19', '14', '25', '0', '0', '5', '1', '255', '17.3', '19:52', '24.69', '48.2']

To convert list of lists to excel, you can use the pandas library.

df = pandas.DataFrame(PlayerStats) # after you save the list of lists
df = df.T # Transpose. rows become columns.
df = df.T # Transpose. change it once again from column to rows
# I know the above is like a hack. Would appreciate if someone came up with
# a neater solution.
# To add column names:
df.columns = ['Heading1', 'Heading2'] # -> get the headings from the site

# To save as excel
df.to_excel("filename.xlsx") # -> has arguments, please check Pandas documentation

A neater one-liner to transpose:

df = pd.DataFrame(PlayerStats).T.T

Output:

0               1        2    3  4   5   6   7    8   9   ...   13 14 15 16  \
0  1  Connor McDavid  2016-17  EDM  C  82  30  70  100  27  ...   27  1  2  6   
1  2   Sidney Crosby  2016-17  PIT  C  75  44  45   89  17  ...   25  0  0  5   

  17   18    19     20     21    22  
0  1  251  11.9  21:07  24.37  43.2  
1  1  255  17.3  19:52  24.69  48.2  

[2 rows x 23 columns]
SajidSalim
  • 359
  • 6
  • 19
0

You cant do it directly. You should place data about player in array. At the and you have something like

[Player1 Data, Player2 Data, ...]

Where Player1 Data is list player data. After this you can need Matrix Transpose in Python.

How you can place Player Data in array. Example.

players_data = []
for post in PlayerStats:
    player_data = []
    for i in range(23):  # where 23 is column count
        player_data.append(post.text)
        next(post)  # iter item
    players_data.append(player_data)
players_data = list(zip(*players_data))  # Here you gote "Python split csv column into rows"

print(players_data[0]) #Output players names

Zheka Koval
  • 525
  • 4
  • 10