75

Is there an easy way to convert a Nokogiri XML document to a Hash?

Something like Rails' Hash.from_xml.

Ivan
  • 97,549
  • 17
  • 50
  • 58
  • 1
    Actually, Rails' Hash.from_xml is neatly wrapped up in the MiniXML section of the Rails code. I've been meaning to extract it since I wrote it. Give me a nudge if you don't hear about it soon. – Joseph Holsten Aug 12 '09 at 12:18
  • Is there something inadequate with `Hash.from_xml(nokogiri_doc.to_xml)`? – JellicleCat Feb 03 '14 at 20:56
  • http://amolnpujari.wordpress.com/2012/03/31/reading_huge_xml-rb/ I found ox 5 times faster than nokogiri, hence here one example in ox - https://gist.github.com/amolpujari/5966431, search for any element and get it in hash form – Amol Pujari Mar 20 '14 at 11:21
  • I posted a modified version of the Ashan Ali's code which [works with attributes and uses Nokogiri](http://gist.github.com/335286) – dimus Mar 17 '10 at 14:34
  • @JellicleCat, yes. Don't waste CPU parsing XML using Nokogiri just to have Nokogiri output it to XML to be parsed by something else. Just pass the raw XML and be done with it. – the Tin Man Sep 24 '15 at 13:20

8 Answers8

108

If you want to convert a Nokogiri XML document to a hash, just do the following:

require 'active_support/core_ext/hash/conversions'
hash = Hash.from_xml(nokogiri_document.to_s)
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Guillaume Roderick
  • 1,113
  • 2
  • 7
  • 2
  • 1
    Please explain where `from_xml` comes from. It isn't a standard Ruby method. – the Tin Man Jan 04 '12 at 22:56
  • 4
    @theTinMan from_xml comes from ActiveSupport – ScottJShea Jan 29 '12 at 09:15
  • 1
    It comes from here : http://api.rubyonrails.org/classes/Hash.html#method-c-from_xml, the code is : `typecast_xml_value(unrename_keys(ActiveSupport::XmlMini.parse(xml))) ` – Dorian Aug 07 '12 at 10:21
  • 1
    This should be the cleanest answer, +1 to this sire – Alexis Rabago Carvajal Jul 29 '15 at 16:29
  • 9
    _NOTE:_ The OP is aware of `from_xml` and mentions the need for something similar to it. Using `from_xml` doesn't answer the question. Also, If the document is already a Nokogiri document then don't convert it to a string just to parse it using some other XML parser. Instead, pass the raw XML and ignore parsing with Nokogiri. Anything else is a waste of CPU time. – the Tin Man Sep 25 '15 at 16:18
  • To add to @theTinMan's comment. It's unnecessary to use Nokogiri to parse the xml, then convert it to a string, then convert it to a hash. If using `active_support` you can go straight to using `Hash::from_xml`. For example: `Hash.from_xml(File.read('some.xml'))` would work – mbigras Feb 20 '17 at 07:31
22

Here's a far simpler version that creates a robust Hash that includes namespace information, both for elements and attributes:

require 'nokogiri'
class Nokogiri::XML::Node
  TYPENAMES = {1=>'element',2=>'attribute',3=>'text',4=>'cdata',8=>'comment'}
  def to_hash
    {kind:TYPENAMES[node_type],name:name}.tap do |h|
      h.merge! nshref:namespace.href, nsprefix:namespace.prefix if namespace
      h.merge! text:text
      h.merge! attr:attribute_nodes.map(&:to_hash) if element?
      h.merge! kids:children.map(&:to_hash) if element?
    end
  end
end
class Nokogiri::XML::Document
  def to_hash; root.to_hash; end
end

Seen in action:

xml = '<r a="b" xmlns:z="foo"><z:a>Hello <b z:m="n" x="y">World</b>!</z:a></r>'
doc = Nokogiri::XML(xml)
p doc.to_hash
#=> {
#=>   :kind=>"element",
#=>   :name=>"r",
#=>   :text=>"Hello World!",
#=>   :attr=>[
#=>     {
#=>       :kind=>"attribute",
#=>       :name=>"a", 
#=>       :text=>"b"
#=>     }
#=>   ], 
#=>   :kids=>[
#=>     {
#=>       :kind=>"element", 
#=>       :name=>"a", 
#=>       :nshref=>"foo", 
#=>       :nsprefix=>"z", 
#=>       :text=>"Hello World!", 
#=>       :attr=>[], 
#=>       :kids=>[
#=>         {
#=>           :kind=>"text", 
#=>           :name=>"text", 
#=>           :text=>"Hello "
#=>         },
#=>         {
#=>           :kind=>"element", 
#=>           :name=>"b", 
#=>           :text=>"World", 
#=>           :attr=>[
#=>             {
#=>               :kind=>"attribute", 
#=>               :name=>"m", 
#=>               :nshref=>"foo", 
#=>               :nsprefix=>"z", 
#=>               :text=>"n"
#=>             },
#=>             {
#=>               :kind=>"attribute", 
#=>               :name=>"x", 
#=>               :text=>"y"
#=>             }
#=>           ], 
#=>           :kids=>[
#=>             {
#=>               :kind=>"text", 
#=>               :name=>"text", 
#=>               :text=>"World"
#=>             }
#=>           ]
#=>         },
#=>         {
#=>           :kind=>"text", 
#=>           :name=>"text", 
#=>           :text=>"!"
#=>         }
#=>       ]
#=>     }
#=>   ]
#=> }
Phrogz
  • 296,393
  • 112
  • 651
  • 745
14

I use this code with libxml-ruby (1.1.3). I have not used nokogiri myself, but I understand that it uses libxml-ruby anyway. I would also encourage you to look at ROXML (http://github.com/Empact/roxml/tree) which maps xml elements to ruby objects; it is built atop libxml.

# USAGE: Hash.from_libxml(YOUR_XML_STRING)
require 'xml/libxml'
# adapted from 
# http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0

class Hash 
  class << self
        def from_libxml(xml, strict=true) 
          begin
            XML.default_load_external_dtd = false
            XML.default_pedantic_parser = strict
            result = XML::Parser.string(xml).parse 
            return { result.root.name.to_s => xml_node_to_hash(result.root)} 
          rescue Exception => e
            # raise your custom exception here
          end
        end 

        def xml_node_to_hash(node) 
          # If we are at the root of the document, start the hash 
          if node.element? 
           if node.children? 
              result_hash = {} 

              node.each_child do |child| 
                result = xml_node_to_hash(child) 

                if child.name == "text"
                  if !child.next? and !child.prev?
                    return result
                  end
                elsif result_hash[child.name.to_sym]
                    if result_hash[child.name.to_sym].is_a?(Object::Array)
                      result_hash[child.name.to_sym] << result
                    else
                      result_hash[child.name.to_sym] = [result_hash[child.name.to_sym]] << result
                    end
                  else 
                    result_hash[child.name.to_sym] = result
                  end
                end

              return result_hash 
            else 
              return nil 
           end 
           else 
            return node.content.to_s 
          end 
        end          
    end
end
A.Ali
  • 749
  • 6
  • 14
14

Use Nokogiri to parse XML response to ruby hash. It's pretty fast.

doc = Nokogiri::XML(response_body) 
Hash.from_xml(doc.to_s)
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
PythonDev
  • 4,287
  • 6
  • 32
  • 39
  • 13
    `doc.to_s` returns what you already have in `response_body`, so nokogiri is useless in your example – Alesya Huzik Jan 22 '15 at 10:47
  • 1
    @alesguzik is right basically in that statement you are parsing the xml twice Hash.from_xml will use REXML by default not Nokogiri also not sure if you can change this – Jesse Whitham Aug 11 '15 at 03:26
  • 2
    Nokogiri is sometimes more resilient to parse poorly formed or encoded XMLs. I have examples where Hash.from_xml(xml_str) would fail, but this would still work. So it can be a fallback for Hash.from_xml(xml_str) – user4887419 Jul 01 '16 at 13:13
  • Be aware that the `Hash.from_xml` function should not be used if accuracy is important. This function starts to fall flat on more complex xml documents completely omitting certain values. – pyRabbit Sep 25 '18 at 17:31
14

I found this while trying to simply convert XML to Hash (not in Rails). I was thinking I would use Nokogiri, but ended up going with Nori.

Then my code was trival:

response_hash = Nori.parse(response)

Other users have pointed out that this does not work. I have not verified, but it seems that the parse method has been moved from the class to the instance. My code above worked at some point. New (unverified) code would be:

response_hash = Nori.new.parse(response)
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
John Hinnegan
  • 5,864
  • 2
  • 48
  • 64
  • I think this is the best solution for apps that do not use Rails. – B Seven Jun 04 '15 at 19:58
  • The *un*verified line works. However, if you have a `Nokogiri::XML` document, you must call its `to_s` method first. E.g. `xml = Nokogiri::XML(File.open('file.xml'))` and then `hash = Nori.new.parse(xml.to_s)`, but the fields appear to be returned as an `Array` without the field names. – code_dredd Feb 25 '16 at 20:25
  • After banging my head against the wall trying to use Nokogiri I finally came across this. The is BY FAR the best solution! Thanks for the post. – Albert Rannetsperger Sep 22 '16 at 09:39
  • I like that its output attributes are prepended with `@`. – konsolebox Dec 26 '21 at 10:41
4

If you define something like this in your configuration:

ActiveSupport::XmlMini.backend = 'Nokogiri'

it includes a module in Nokogiri and you gain the to_hash method.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
0

Have a look at the simple mix-in I made for the Nokogiri XML Node.

http://github.com/kuroir/Nokogiri-to-Hash

Here's a usage example:

require 'rubygems'
require 'nokogiri'
require 'nokogiri_to_hash'
html = '
  <div id="hello" class="container">
    <p>Hello! visit my site <a href="http://kuroir.com">Kuroir.com</a></p>
  </div>
'
p Nokogiri.HTML(html).to_hash
=> [{:div=>{:class=>["container"], :children=>[{:p=>{:children=>[{:a=>{:href=>["http://kuroir.com"], :children=>[]}}]}}], :id=>["hello"]}}]
MarioRicalde
  • 9,131
  • 6
  • 40
  • 42
0

If the node you've selected in Nokogiri consists of only one tag, you can extract the keys, values and zip them into one hash, like so:

  @doc ||= Nokogiri::XML(File.read("myxmldoc.xml"))
  @node = @doc.at('#uniqueID') # this works if this selects only one node
  nodeHash = Hash[*@node.keys().zip(@node.values()).flatten]

See http://www.ruby-forum.com/topic/125944 for more info on Ruby array merging.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
juanfe
  • 9
  • 1