Given markup like:
<p>
<code>foo</code><code>bar</code>
<code>jim</code> and then <code>jam</code>
</p>
I need to select the first three <code>
—but not the last. The logic is "Select all code
elements that have a preceding-or-following-sibling-element that is also a code
, unless there exist one or more text nodes with non-whitespace content between them.
Given that I am using Nokogiri (which uses libxml2) I can only use XPath 1.0 expressions.
Although a tricky XPath expression is desired, Ruby code/iterations to perform the same on a Nokogiri document are also acceptable.
Note that the CSS adjacent sibling selector ignores non-element nodes, and so selecting nokodoc.css('code + code')
will incorrectly select the last <code>
block.
Nokogiri.XML('<r><a/><b/> and <c/></r>').css('* + *').map(&:name)
#=> ["b", "c"]
Edit: More test cases, for clarity:
<section><ul>
<li>Go to <code>N</code> and
then <code>Y</code><code>Y</code><code>Y</code>.
</li>
<li>If you see <code>N</code> or <code>N</code> then…</li>
</ul>
<p>Elsewhere there might be: <code>N</code></p>
<p><code>N</code> across parents.</p>
<p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
<p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>
All the Y
above should be selected. None of the N
should be selected. The content of the <code>
are used only to indicate which should be selected: you may not use the content to determine whether or not to select an element.
The context elements in which the <code>
appear are irrelevant. They may appear in <li>
, they may appear in <p>
, they may appear in something else.
I want to select all the consecutive runs of <code>
at once. It is not a mistake that there is a space character in the middle of one of sets of Y
.