I have following code:
soup = BeautifulSoup(open("2002/102002.html", 'rb'), "lxml")
soup = soup.select('body > table> tr:nth-child(2) > td:nth-child(2) > table:nth-child(2) > thead')[0]
for i, tr in enumerate(reversed(soup.findAll("tr"))):
if i == len(soup.findAll("tr")) - 1:
continue
date = str(tr.select('td:nth-child(1) > a')[0].string)
time = str(tr.select('td:nth-child(2) > a')[0].string)
... 20 rows
data.append({'date': date, ...})
df = pd.DataFrame(data, columns = col_names)
Problem is that for parsing one html table with 240 rows I wait 30 min and parsing is not ended.
How can I increase speed of parsing?