I'm trying to figure out the python lxml api, but am running into a peculiar problem. I've installed the following library versions:
- libxml2 : 2.7.8
- libxslt : 1.1.26
When I run the following code:
html = open('file.html', 'r')
context = etree.iterparse(StringIO(html), events=("start", "end"), html='true')
for event, element in context:
#do stuff
EDIT :
It turns out that it is a parsing error. I moved the html to a file(shown below)
<html>
<head></head>
<body>
<table>
<tr>
<td>image</td>
<a href="relative.phtml?with=querystring&blah=blah">blah\n(blah)</a></td>
<td> 35 </td>
<td> 28 </td>
<td><b>-7</b></td>
<td>
23,000 </td>
<td> 373,000 </td>
<td> 644,000 </td>
<td>+72.65%</td>
</tr>
<tr>
<td>image</td>
<td><a href="relative.phtml?with=querystring&blah=blah">blah\n(blah)</a></td>
<td> 35 </td>
<td> 28 </td>
<td><b>-7</b></td>
<td>
23,000 </td>
<td> 373,000 </td>
<td> 644,000 </td>
<td>+72.65%</td>
</tr>
</table>
</body>
</html>
I'm now getting this error:
for event, element in context:
File "iterparse.pxi", line 515, in lxml.etree.iterparse.next (src/lxml/lxml.etree.c:86484) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64084) lxml.etree.XMLSyntaxError: error parsing attribute name, line 1, column 12
ORIGIN ERROR:
for event, element in context:
File "iterparse.pxi", line 515, in lxml.etree.iterparse.next (src/lxml/lxml.etree.c:86484) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64084) lxml.etree.XMLSyntaxError: htmlParseEntityRef: expecting ';', line 7, column 71
I thought I followed the tutorial from lxml's site pretty closely here so I'm very confused. Could it be an installation problem?