I'm getting an assertion error saying 20 columns passed, but passed data has 50 columns. I kind of know what is causing this error, but it is late and I'm not exactly sure how to fix it - the issue is that there are truly 20 column headers, but the 50 figure comes from the number of rows. I'm thinking it might have something to do with the loop as well, but any help would be appreciated as I imagine this is simple but I'm not quite sure how to fix it.
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time
playerData = []
for i in range(6):
initialURL = 'https://www.fangraphs.com/leaders.aspx?pos=all&stats=sta&lg=all&qual=0&type=8&season=2017&month=0&season1=2017&ind=0&team=0&rost=0&age=0&filter=&players=0&sort=7,d&page=' + str(i) +'_50'
r = requests.get(initialURL)
soup = BeautifulSoup(r.text, 'html.parser')
statistics = soup.find("table", {"class" : "rgMasterTable"})
statistics.findAll('th')
column_headers = [th.getText() for th in soup.findAll('th')]
data = statistics.findAll('tr')[3:]
pitcherStatistics = [[td.text.strip() for td in data[a].findAll('td')]
for a in range(len(data))]
playerData.append(pitcherStatistics)
print(playerData)
df = pd.DataFrame(playerData, columns=column_headers)
df.to_csv("Starting Pitchers.csv", index=False)