0

I have parsed a table and would like to convert two of those variables to a Pandas Dataframe to print to excel.

FYI: I did ask a similar question, however, it was not answered thoroughly. There was no suggestion on how to create a Pandas DataFrame. This was the whole point of my question.

Caution: There is small issue with the data that I parsed. The data contains "TEAM" and "SA/G" multiple times in the output.

The 1st variable that I would like in the DataFrame is 'TEAM'.

The 2nd variable that I would like in the DataFrame is 'SA/G'.

Here is my code so far:

# imports
from selenium import webdriver      
from bs4 import BeautifulSoup

# make a webdriver object
driver = webdriver.Chrome('C:\webdrivers\chromedriver.exe')

# open some page using get method       - url -- > parameters
driver.get('http://www.espn.com/nhl/statistics/team/_/stat/scoring/sort/avgGoals')

# driver.page_source
soup = BeautifulSoup(driver.page_source,'lxml')
#close driver
driver.close()
#find table
table = soup.find('table')
#find_all table rows
t_rows = table.find_all('tr')
#loop through tr to find_all td
for tr in t_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    # print(row)
    # print(row[9])
    # print(row[1], row[9])

    team = row[1]
    sag = row[9]

    # print(team, sag)
    data = [(team, sag)]
    print(data)

Here is the final output that I would like printed to excel using the Pandas DataFrame option:

Team           SA/G
Nashville      30.1
Colorado       33.6
Washington     31.0
...             ...

Thanks in advance for any help that you may offer. I am still learning and appreciate any feedback that I can get.

Able Archer
  • 579
  • 5
  • 19

2 Answers2

1

First inside the "for loop" append tuples into a list (instead of doing data=[(x,y)] declare the data variable before the loop as a list data = list() and append the tuples to list in the loop data.append((x,y))) and do the following

import pandas as pd
data=[("t1","sag1"),("t2","sag2"),("t3","sag3")]
df = pd.DataFrame(data,columns=['Team','SA/G'])
print(df)
cerofrais
  • 1,117
  • 1
  • 12
  • 32
  • Forgive me. I am not sure where to add this code into my code. Ty so much for your response. I did try it but I did not get the desired output. I am sure I did something wrong. – Able Archer Oct 30 '19 at 04:23
1

Looks like you want to create a DataFrame from a list of tuples, which has been answered here. I would change your code like this:

# Initial empty list
data = []
#loop through tr to find_all td
for tr in t_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    team = row[1]
    sag = row[9]
    # Add tuple containing one row of data
    data.append((team, sag))
# Create df from list of tuples
df = pd.DataFrame(data, columns=['Team', 'SA/G'])
# Remove lines where Team value is "TEAM"
df = df[df["Team"] != "TEAM"]

EDIT: Add line to remove ("TEAM", "SA/G") rows in df

d_d
  • 46
  • 7
  • TY @d_d!! The output looks great!! I did notice that the output contains "TEAM" and "SAG" multiple times. Do you know by chance how to remove this from the output? I want to thank you very much for your help. I will learn a lot from this. =) I would like to keep the header, just not the duplicated "TEAM" and "SAG" – Able Archer Oct 30 '19 at 04:20
  • 1
    You can use df = df[df["Team"] != "TEAM"] – d_d Oct 30 '19 at 04:28
  • I tried to add this line of code @d_d but the extra data is still in the output. Any suggestions? – Able Archer Oct 30 '19 at 07:07
  • 1
    @AbleArcher I edited the answer above to include the extra line - when I tested I was getting the right results. Are you adding the "df =" at the beginning? – d_d Oct 30 '19 at 07:15
  • Your code works perfectly just like you said @d_d! I made the mistake of not capitalizing "TEAM". You are amazing, thank you sir! – Able Archer Oct 30 '19 at 07:19