I am looking for some advices on how it could be done. I'm trying a solution only with xpath:
An html example:
<div>
<div>
<div>text div (leaf)</div>
<p>text paragraph (leaf)</p>
</div>
</div>
<p>text paragraph 2 (leaf)</p>
Code:
doc = Nokogiri::HTML.fragment("- the html above -")
result = doc.xpath("*[not(child::*)]")
[#<Nokogiri::XML::Element:0x3febf50f9328 name="p" children=[#<Nokogiri::XML::Text:0x3febf519b718 "text paragraph 2 (leaf)">]>]
But this xpath only gives me the last "p". What I want is like a flatten behavior, only returning the leaf nodes.
Here are some reference answers in stackoverflow:
How to select all leaf nodes using XPath expression?
XPath - Get node with no child of specific type
Thanks
text paragraph (leaf)
`? And if you want just the text, do you want all the text nodes separately, or do you simply want all the text concantenated as a single string? – Borodin Jul 26 '13 at 22:55