2

I am trying to get some nodes from the the below xml.

<SalesStart Value="1412899200">10.10.2014</SalesStart>
<SalesEnd Value="4102358400">31.12.2099</SalesEnd>
<Price Value="4.9900">4,99</Price>
<SalesStartEst Value="1411516800">24.09.2014</SalesStartEst>
<SalesEndEst Value="1697500800">17.10.2023</SalesEndEst>

I can access nodes like doc.text_at('SalesStart'). Is it possible to access nodes with regular expression something like

doc.text_at('Sales'[/Start/]) or doc.css('Sales'[/Start/])

so that i can get 2 nodes**(SalesStart and SalesStartEst)** in a single query??

Phrogz
  • 296,393
  • 112
  • 651
  • 745
rubyist
  • 3,074
  • 8
  • 37
  • 69
  • 1
    Possible duplicate of http://stackoverflow.com/questions/1556028/how-do-i-do-a-regex-search-in-nokogiri – karlingen Jan 23 '15 at 11:25
  • 1
    The proposed duplicate is a title match, but not a good duplicate. It shows how to find attributes whose value matches a pattern; this question is about elements whose name matches a pattern. – Phrogz Jan 23 '15 at 20:50

1 Answers1

1

You cannot use a generic regular expression in Nokogiri itself—since it leans on libxml2 which only supports XPath 1.0—but in your case you just want elements whose name starts with SalesStart. That is possible in XPath 1.0 using the starts-with() function:

# Find all elements, ensuring the correct prefix on the name
doc.xpath("//*[starts-with(name(),'SalesStart')]")

Demo:

require 'nokogiri'
doc = Nokogiri.XML '
  <r>
    <SalesStart Value="1412899200">10.10.2014</SalesStart>
    <SalesEnd Value="4102358400">31.12.2099</SalesEnd>
    <Price Value="4.9900">4,99</Price>
    <SalesStartEst Value="1411516800">24.09.2014</SalesStartEst>
    <SalesEndEst Value="1697500800">17.10.2023</SalesEndEst>
  </r>
'

starts = doc.xpath("//*[starts-with(name(),'SalesStart')]").map(&:text)
p starts #=> ["10.10.2014", "24.09.2014"]

However, if you did need a regular expression, then you can over-find the elements using Nokogiri and then use Ruby to pare down the set. For example:

# memory-heavy approach; pulls all elements and then pares them down
starts = doc.xpath('//*').select{ |e| e.name =~ /^SalesStart/ }

# lightweight approach, accessing one node at a time
starts = []
doc.traverse do |node|
  starts<<node if node.element? && node.name =~ /^SalesStart/
end
p starts.map(&:text) #=> ["10.10.2014", "24.09.2014"]

You can even wrap this up as a convenience method:

# monkeypatching time!
class Nokogiri::XML::Node
  def elements_with_name_matching( regex )
    [].tap{ |result| traverse{ |n| result<<n if n.element? && n.name=~regex } }
  end
end

p doc.elements_with_name_matching( /^SalesStart/ ).map(&:text)
#=> ["10.10.2014", "24.09.2014"]
Community
  • 1
  • 1
Phrogz
  • 296,393
  • 112
  • 651
  • 745