1

I'm using python and lxml to get the the content of div.article from a load of links. I want the actual html markup of the div. But so far I've only been able to get the text_content() of the links which strips out the markup.

doc = html.fromstring(doc_text)

article = doc.cssselect("div.article")

if len(article) > 0:
    text = article[0].text_content()

    data = {
        'product':product,
        'content': text,
    }

Can anyone help me to get the markup of article[0]?

Thanks

iamjonesy
  • 24,732
  • 40
  • 139
  • 206

1 Answers1

4

You can just use the iteration features of the node and build your string that way.

def innerHTML(node): 
    buildString = ''
    for child in node:
        buildString += html.tostring(child)
    return buildString
Spen-ZAR
  • 818
  • 6
  • 19