how can i proceed to get the parent node of a node containing a piece of text?
moreover can i use some regexp mecanism as the matched element for searching/filtering, below searching from re.compile("th[ei]s? .ne")
for example?
say this one
html = '''<html>
<head><title></title></head>
<body>
<table>
<tr><td>1a</td><td>2a</td><td>3a</td><td>4a</td><td>5a</td><td>6a</td></tr>
<tr><td>1b</td><td>2b</td><td>3b</td><td>4b</td><td>5b</td><td>6b</td></tr>
<tr><td>1c</td><td>2c</td><td>3c</td><td>4c</td><td>5c</td><td>6c this one</td></tr>
</table>
<div><div>
<table>
<tr><td>1A</td><td>2A</td><td>3A</td><td>4A</td><td>5A</td><td>6A</td></tr>
<tr><td>1B</td><td>2B</td><td>3B</td><td>4B</td><td>5B</td><td>6B</td></tr>
<tr><td>1C</td><td>2C</td><td>3C</td><td>4C</td><td>5C</td><td>6C</td></tr>
</table>this one
</div></div>
</body>
</html>'''
i would like to have an iterator that return:
<td>6c this one</td>
and then:
<div>
<table>
<tr><td>1A</td><td>2A</td><td>3A</td><td>4A</td><td>5A</td><td>6A</td></tr>
<tr><td>1B</td><td>2B</td><td>3B</td><td>4B</td><td>5B</td><td>6B</td></tr>
<tr><td>1C</td><td>2C</td><td>3C</td><td>4C</td><td>5C</td><td>6C</td></tr>
</table>this one
</div>
i tried:
import lxml.html
root = lxml.html.document_fromstring(html)
root.xpath("//text()[contains(., one)]")
and
import xml.etree.ElementTree as ET
for e in ET.fromstring(html).getiterator():
if e.text and e.text.find('one') != -1:
print "Found string %r, element = %r" % (e.text, e)
but the best i can have is the node containing this one
itself... while i am looking for the parent containing this text. notice that div or table are only for example, i really need to go backward to the parent after finding "this one" rather than filtering xml element containing this one
because i will not know that this is a div, a table or anything before finding what it contains.
(notice also that it is html and not well formated xml, as i suppose que the second this one
should have been wrapped in a xml tag)
EDIT:
>>> root.xpath("//*[contains(child::*/text(), 'one')]") # why empty parent?
[]
>>> root.xpath("//*[contains(text(), 'one')]") # i expected to have a list with two elements td and div
[<Element td at 0x280b600>]
>>> root.xpath("//*[child::*[contains(text(), 'one')]]") # if parent: expected tr and div, if not parent expected table or div, still missing one
[<Element tr at 0x2821f30>]
BTW, using the last is ok:
import xml.etree.ElementTree as ET
import lxml.html
#[... here add html = """...]
root = lxml.html.document_fromstring(html)
for i, x in enumerate(root.xpath("//text()[contains(., 'one')]/parent::*")):
print "%s => \n\t" % i, ET.tostring(x).replace("\n", "\n\t")
produce:
0 =>
<td>6c this one</td>
1 =>
<div>
<table>
<tr><td>1A</td><td>2A</td><td>3A</td><td>4A</td><td>5A</td><td>6A</td></tr>
<tr><td>1B</td><td>2B</td><td>3B</td><td>4B</td><td>5B</td><td>6B</td></tr>
<tr><td>1C</td><td>2C</td><td>3C</td><td>4C</td><td>5C</td><td>6C</td></tr>
</table>this one
</div>