I need to parse html tables to do things like get all cells in a column above/below or left/right of a certain cell. Is there a python library that can do this easily?
Asked
Active
Viewed 562 times
4 Answers
1
You can use lxml - XML and HTML with Python - to parse a table. Here is a simple example of what you can do with a table (load & iterate through rows).

Community
- 1
- 1

Sergei Danielian
- 4,938
- 4
- 36
- 58
0
Take a look at pyquery. It allows to make jquery queries on xml documents. A quick look at the API seemed that prevAll and nextAll can find the left/right cells. Think it will not be that difficult to get the above/below ones as well.

Can't Tell
- 12,714
- 9
- 63
- 91
0
This code convert all tables in page to lists.
import pandas as pd
url = r'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tables = pd.read_html(url) # Returns list of all tables on page
sp500_table = tables[0] # Select table of interest

Alexandr Ovdienko
- 106
- 1
- 4