Beautifulsoup - how to improve speed of parsing?

Asked May 16 '20 at 14:13

Active May 16 '20 at 14:13

Viewed 132 times

I have following code:

soup = BeautifulSoup(open("2002/102002.html", 'rb'), "lxml")
soup = soup.select('body > table> tr:nth-child(2) > td:nth-child(2) > table:nth-child(2) > thead')[0]

for i, tr in enumerate(reversed(soup.findAll("tr"))):
    if i == len(soup.findAll("tr")) - 1:
        continue
    date = str(tr.select('td:nth-child(1) > a')[0].string)
    time = str(tr.select('td:nth-child(2) > a')[0].string)
   ... 20 rows
    data.append({'date': date,  ...})

df = pd.DataFrame(data, columns = col_names)

Problem is that for parsing one html table with 240 rows I wait 30 min and parsing is not ended.

How can I increase speed of parsing?

asked May 16 '20 at 14:13

Dmitry Sokolov

1,303
1
17
29

have you tried with [`lxml`](https://lxml.de) – binpy May 16 '20 at 14:21
yes, I have used lxml – Dmitry Sokolov May 16 '20 at 14:22
Take a look at this: It discusses this very problem: (https://stackoverflow.com/questions/25539330/speeding-up-beautifulsoup) – anveshjhuboo May 16 '20 at 14:24

Beautifulsoup - how to improve speed of parsing?

0 Answers0