If the HTML is not XML you can't do it with etree. But even then, you don't have to use an external library for parsing a HTML table. In python 3 you can reach your goal with HTMLParser
from html.parser
. I've the code of the simple derived HTMLParser class here in a github repo.
You can use that class (here named HTMLTableParser
) the following way:
import urllib.request
from html_table_parser import HTMLTableParser
target = 'http://www.twitter.com'
# get website content
req = urllib.request.Request(url=target)
f = urllib.request.urlopen(req)
xhtml = f.read().decode('utf-8')
# instantiate the parser and feed it
p = HTMLTableParser()
p.feed(xhtml)
print(p.tables)
The output of this is a list of 2D-lists representing tables. It looks maybe like this:
[[[' ', ' Anmelden ']],
[['Land', 'Code', 'Für Kunden von'],
['Vereinigte Staaten', '40404', '(beliebig)'],
['Kanada', '21212', '(beliebig)'],
...
['3424486444', 'Vodafone'],
[' Zeige SMS-Kurzwahlen für andere Länder ']]]