4

I need to parse html tables to do things like get all cells in a column above/below or left/right of a certain cell. Is there a python library that can do this easily?

myahya
  • 3,079
  • 7
  • 38
  • 51

4 Answers4

2

BeautifulSoup

KurzedMetal
  • 12,540
  • 6
  • 39
  • 65
1

You can use lxml - XML and HTML with Python - to parse a table. Here is a simple example of what you can do with a table (load & iterate through rows).

Community
  • 1
  • 1
Sergei Danielian
  • 4,938
  • 4
  • 36
  • 58
0

Take a look at pyquery. It allows to make jquery queries on xml documents. A quick look at the API seemed that prevAll and nextAll can find the left/right cells. Think it will not be that difficult to get the above/below ones as well.

Can't Tell
  • 12,714
  • 9
  • 63
  • 91
0

This code convert all tables in page to lists.

import pandas as pd
url = r'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tables = pd.read_html(url) # Returns list of all tables on page
sp500_table = tables[0] # Select table of interest