0

I have following code:

soup = BeautifulSoup(open("2002/102002.html", 'rb'), "lxml")
soup = soup.select('body > table> tr:nth-child(2) > td:nth-child(2) > table:nth-child(2) > thead')[0]

for i, tr in enumerate(reversed(soup.findAll("tr"))):
    if i == len(soup.findAll("tr")) - 1:
        continue
    date = str(tr.select('td:nth-child(1) > a')[0].string)
    time = str(tr.select('td:nth-child(2) > a')[0].string)
   ... 20 rows
    data.append({'date': date,  ...})

df = pd.DataFrame(data, columns = col_names)

Problem is that for parsing one html table with 240 rows I wait 30 min and parsing is not ended.

How can I increase speed of parsing?

Dmitry Sokolov
  • 1,303
  • 1
  • 17
  • 29

0 Answers0