0

I'm getting an assertion error saying 20 columns passed, but passed data has 50 columns. I kind of know what is causing this error, but it is late and I'm not exactly sure how to fix it - the issue is that there are truly 20 column headers, but the 50 figure comes from the number of rows. I'm thinking it might have something to do with the loop as well, but any help would be appreciated as I imagine this is simple but I'm not quite sure how to fix it.

from bs4 import BeautifulSoup
import requests
import pandas as pd
import time

playerData = []

for i in range(6):
    initialURL = 'https://www.fangraphs.com/leaders.aspx?pos=all&stats=sta&lg=all&qual=0&type=8&season=2017&month=0&season1=2017&ind=0&team=0&rost=0&age=0&filter=&players=0&sort=7,d&page=' + str(i) +'_50'
    r = requests.get(initialURL)
    soup = BeautifulSoup(r.text, 'html.parser')
    statistics = soup.find("table", {"class" : "rgMasterTable"})
    statistics.findAll('th')
    column_headers = [th.getText() for th in soup.findAll('th')]
    data = statistics.findAll('tr')[3:]
    pitcherStatistics = [[td.text.strip() for td in data[a].findAll('td')]
                          for a in range(len(data))]
    playerData.append(pitcherStatistics)


print(playerData)

df = pd.DataFrame(playerData, columns=column_headers)
df.to_csv("Starting Pitchers.csv", index=False)
sophros
  • 14,672
  • 11
  • 46
  • 75
Shawn Schreier
  • 780
  • 2
  • 10
  • 20
  • Does this answer your question? [AssertionError: 22 columns passed, passed data had 21 columns](https://stackoverflow.com/questions/40855030/assertionerror-22-columns-passed-passed-data-had-21-columns) – sophros Feb 06 '20 at 13:12

1 Answers1

0

It looks like playerData is 3D, but a DataFrame is only 2D. I guess your problem is that you're using 3 tr elements in each "row" when only 1 can fit. You need to keep playerData 2D, perhaps by appending the three tr elements individually and not as a sublist.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • I'm not quite sure I'm following the issue or how to resolve it. I am not trying to make playerData 3D and I am not trying to use 3 tr elements in each row. – Shawn Schreier Mar 24 '18 at 03:35
  • Do you see how `playerData` is a list of list of lists? That's 3D. It should be a list of lists, i.e. 2D. – John Zwinck Mar 24 '18 at 03:41
  • I'm still pretty new so any help on what I need to do to fix it? I essentially just want a dataframe with the column headers and the players data for the first 5 pages of the website. – Shawn Schreier Mar 24 '18 at 03:50
  • I fixed part of the issue but created a new one. I scratched the playerData List and told the dataframe to get the pitcherStatistics data. However, when I did that, now it only output the 5th page of players rather than all 5. – Shawn Schreier Mar 24 '18 at 03:53