4

I'm trying to parse a local HTML with lxml, but I'm getting an error, but I don't know why (sorry in advance for the bad code, I'm new to this).

from lxml import etree, html
from StringIO import StringIO

parser = etree.HTMLParser()
doc = etree.parse(StringIO("test1.html"), parser)
tree = html.fromstring(doc)
CCE = tree.xpath('//div[@data-reactid]/div[@class="browse-summary"]/h1')
URL = tree.xpath('//a[@class="rc-OfferingCard"]/@href')

print 'CCE:', CCE
print 'URL:', URL

And here's the error:

  File "test.py", line 8, in <module>
tree = html.fromstring(doc)
File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 703, in fromstring
is_full_html = _looks_like_full_html_unicode(html)
TypeError: expected string or buffer
Lara M.
  • 855
  • 2
  • 10
  • 23

1 Answers1

7

I think you need

tree = etree.parse("text1.html", parser)

without StringIO and fromstring

furas
  • 134,197
  • 12
  • 106
  • 148
  • 1
    YES! It works perfectly! Thank you, I cannot give a feedback for your answer due to my low reputation, but it works fine!! – Lara M. Jan 25 '16 at 12:48