I am using lxml
to parse web document, I want to get all the text in a <p>
element, so I use the code as follow:
from lxml import etree
page = etree.HTML("<html><p>test1 <br /> test2</p></html>")
print page.xpath("//p")[0].text # this just print "test1" not "test1 <br/> test2"
The problem is I want to get all text in <p>
which is test1 <br /> test2
in the example, but lxml
just give me test1
.
How can I get all text in <p>
element?
`. Check my answer for some possible ways – har07 Apr 10 '15 at 07:55
` which is `test1
test2`*". This is not correct. The actual text content is `test1 test2`. The `
` element is a child of `
`, but it is not text.
– mzjn Apr 10 '15 at 15:10