7

How do I convert an XML body to a hash in Ruby?

I have an XML body which I'd like to parse into a hash

<soap:Body>
    <TimesInMyDAY>
        <TIME_DATA>
            <StartTime>2010-11-10T09:00:00</StartTime>
            <EndTime>2010-11-10T09:20:00</EndTime>
        </TIME_DATA>
        <TIME_DATA>
            <StartTime>2010-11-10T09:20:00</StartTime>
            <EndTime>2010-11-10T09:40:00</EndTime>
        </TIME_DATA>
        <TIME_DATA>
            <StartTime>2010-11-10T09:40:00</StartTime>
            <EndTime>2010-11-10T10:00:00</EndTime>
        </TIME_DATA>
        <TIME_DATA>
            <StartTime>2010-11-10T10:00:00</StartTime>
            <EndTime>2010-11-10T10:20:00</EndTime>
        </TIME_DATA>
        <TIME_DATA>
            <StartTime>2010-11-10T10:40:00</StartTime>
            <EndTime>2010-11-10T11:00:00</EndTime>
        </TIME_DATA>
    </TimesInMyDAY>
</soap:Body>

I'd like to convert it into a hash like this:

{ :times_in_my_day => { 
    :time_data = > [
        {:start_time=>"2010-11-10T09:00:00", :end_time => "2010-11-10T09:20:00" },
        {:start_time=>"2010-11-10T09:20:00", :end_time => "2010-11-10T09:40:00" },
        {:start_time=>"2010-11-10T09:40:00", :end_time => "2010-11-10T10:00:00" },
        {:start_time=>"2010-11-10T10:00:00", :end_time => "2010-11-10T10:20:00" },
        {:start_time=>"2010-11-10T10:40:00", :end_time => "2010-11-10T11:00:00" }
        ]
    } 
}

Ideally, the tags would convert to snake_case symbols and become keys within the hash.

Also, the datetimes are missing their time zone offsets. They are in the local time zone (not UTC). So I'd like to parse it to show the local offset and then convert the xml datetime strings into Rails DateTime objects. The resulting array would be something like:

{ :times_in_my_day => { 
    :time_data = > [
        {:start_time=>Wed Nov 10 09:00:00 -0800 2010, :end_time => Wed Nov 10 9:20:00 -0800 2010 },
        {:start_time=>Wed Nov 10 09:20:00 -0800 2010, :end_time => Wed Nov 10 9:40:00 -0800 2010 },
        {:start_time=>Wed Nov 10 09:40:00 -0800 2010, :end_time => Wed Nov 10 10:00:00 -0800 2010 },
        {:start_time=>Wed Nov 10 10:00:00 -0800 2010, :end_time => Wed Nov 10 10:20:00 -0800 2010 },
        {:start_time=>Wed Nov 10 10:40:00 -0800 2010, :end_time => Wed Nov 10 11:00:00 -0800 2010 }
        ]
    } 
}

I was able to convert a single datetime with the parse and in_time_zone methods this way:

Time.parse(xml_datetime).in_time_zone(current_user.time_zone)

But I'm not quite sure the best way to parse the times while converting the XML into a hash.

I'd appreciate any advice. Thanks!

Edit

The code for converting the datetime string into a Rails DateTime object is wrong. That will parse the xml datetime string to the system's timezone offset and then convert that time to the user's timezone. The correct code is:

Time.zone.parse(xml_datetime)

If the user has a different time zone other than the system, this will add the user's time zone offset to the original datetime string. There's a Railscast on how to enable user timezone preferences here: http://railscasts.com/episodes/106-time-zones-in-rails-2-1.

Chanpory
  • 3,015
  • 6
  • 37
  • 49

5 Answers5

15

Hash.from_xml(xml) is simple way to solve this. Its activesupport method

Taimoor Changaiz
  • 10,250
  • 4
  • 49
  • 53
6

I used to use XML::Simple in Perl because parsing XML using Perl was a PITA.

When I switched to Ruby I ended up using Nokogiri, and found it to be very easy to use for parsing HTML and XML. It's so easy that I think in terms of CSS or XPath selectors and don't miss a XML-to-hash converter.

require 'ap'
require 'date'
require 'time'
require 'nokogiri'

xml = %{
<soap:Body>
    <TimesInMyDAY>
        <TIME_DATA>
            <StartTime>2010-11-10T09:00:00</StartTime>
            <EndTime>2010-11-10T09:20:00</EndTime>
        </TIME_DATA>
        <TIME_DATA>
            <StartTime>2010-11-10T09:20:00</StartTime>
            <EndTime>2010-11-10T09:40:00</EndTime>
        </TIME_DATA>
        <TIME_DATA>
            <StartTime>2010-11-10T09:40:00</StartTime>
            <EndTime>2010-11-10T10:00:00</EndTime>
        </TIME_DATA>
        <TIME_DATA>
            <StartTime>2010-11-10T10:00:00</StartTime>
            <EndTime>2010-11-10T10:20:00</EndTime>
        </TIME_DATA>
        <TIME_DATA>
            <StartTime>2010-11-10T10:40:00</StartTime>
            <EndTime>2010-11-10T11:00:00</EndTime>
        </TIME_DATA>
    </TimesInMyDAY>
</soap:Body>
}

time_data = []

doc = Nokogiri::XML(xml)
doc.search('//TIME_DATA').each do |t|
  start_time = t.at('StartTime').inner_text
  end_time = t.at('EndTime').inner_text
  time_data << {
    :start_time => DateTime.parse(start_time),
    :end_time   => Time.parse(end_time)
  }
end

puts time_data.first[:start_time].class
puts time_data.first[:end_time].class
ap time_data[0, 2]

with the output looking like:

DateTime
Time
[
    [0] {
        :start_time => #<DateTime: 2010-11-10T09:00:00+00:00 (19644087/8,0/1,2299161)>,
          :end_time => 2010-11-10 09:20:00 -0700
    },
    [1] {
        :start_time => #<DateTime: 2010-11-10T09:20:00+00:00 (22099598/9,0/1,2299161)>,
          :end_time => 2010-11-10 09:40:00 -0700
    }
]

The time values are deliberately parsed into DateTime and Time objects to show that either could be used.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Cool, trying this out now. Is there a way to convert the Nokogiri xml doc into a hash? Something like `doc.to_hash`?. I have a case where the XML source is deeply nested, so wondered if there's an elegant way to do it without writing a lot of iterators for each level. – Chanpory Nov 11 '10 at 04:54
  • Looks like I can do `result = Hash.from_xml(xml_source)`, but doesn't convert the tags to snake_case symbols :-( – Chanpory Nov 11 '10 at 05:19
  • The whole idea is to avoid converting the whole XML file to a hash. It works on small files but falls apart with large ones. The XPATH accessors are very powerful and can offload some of the search and iteration to the XML parser, which is very fast. See Nokogiri's [Searching an HTML / XML Document](http://nokogiri.org/tutorials/searching_a_xml_html_document.html) doc for more info. – the Tin Man Nov 11 '10 at 05:21
  • Makes sense, I have a few other levels and elements within the document, that I'm trying to map into map into a database, so I thought having them as a hash to iterate over would be the way to go. But that might be an unnecessary step with Nokogiri's search features! – Chanpory Nov 11 '10 at 05:42
  • It's just a different way to iterate. Get used to doing it with Nokogiri and you'll find it similarly easy to grab data from HTML pages, assuming the HTML isn't pathological. – the Tin Man Nov 11 '10 at 06:14
  • Just noticed that you have `:start_time => Time.parse(start_time)` and `:end_time => DateTime.parse(end_time)`... did you mean to use Time and DateTime differently? Just checking if there's a reason for the difference. – Chanpory Nov 11 '10 at 09:00
  • I just tried the code from your first example, and am getting the same time data for each iteration, when they should be different. You're example also shows the same result. Not exactly sure why... – Chanpory Nov 11 '10 at 10:29
  • Looks like I needed to remove the `//` so that it becomes `start_time = t.at('StartTime').inner_text`... I gotta get used to xpath selectors! Not yet intuitive for me. – Chanpory Nov 11 '10 at 10:33
  • I deliberately used Time and DateTime, to show you could use either. The use of '//' in the inner loop was accidental and closely tied to NyQuil kicking in. – the Tin Man Nov 11 '10 at 17:41
  • I updated the example to reflect the right accessors for the start and end times. – the Tin Man Nov 11 '10 at 17:53
  • Great thanks, I was going crazy trying to figure out why it wasn't working, then finally got it at 2:30am! Hope you feel better soon! – Chanpory Nov 11 '10 at 17:59
3

ActiveSupport adds a Hash.from_xml, which does the conversion in a single call. Described in another question: https://stackoverflow.com/a/7488299/937595

Example:

require 'open-uri'
remote_xml_file = "https://www.example.com/some_file.xml"
data = Hash.from_xml(open(remote_xml_file))
Community
  • 1
  • 1
Erik
  • 431
  • 4
  • 5
2

The original question was asked some time ago, but I found a simpler solution than using Nokogiri and searching for specific names in the XML.

Nori.parse(your_xml) will parse the XML into a hash and the keys will have the same names as your XML items.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Tõnis M
  • 470
  • 4
  • 18
0

If you don't mind using a gem, crack does a pretty good job at this.

Crack does the XML to hash processing, then you can loop over the resulting hash to normalize the datetimes.

edit Using REXML, you could try the following (should be close to working, but I do not have access to a terminal so it may need some tweaking):

require 'rexml/document'
arr = []
doc = REXML::XPath.first(REXML::Document.new(xml), "//soap:Body/TimesInMyDAY").text
REXML::XPath.each(doc, "//TIME_DATA") do |el|
  start = REXML::XPath.first(el, "//StartTime").text
  end = REXML::XPath.first(el, "//EndTime").text
  arr.push({:start_time => Time.parse(start).in_time_zone(current_user.time_zone), :end_time => Time.parse(end).in_time_zone(current_user.time_zone)})
end

hash = { :times_in_my_day => { :time_data => arr } }

Of course, this assumes the structure is ALWAYS the same, and that the example you posted was not contrived for simplicity sake (as examples often are).

William
  • 3,511
  • 27
  • 35
  • Don't mind using a gem, but I tried using Savon gem which includes a to_hash method which uses Crack... however, I was having problems with the date parsing. It seems Savon/Crack will assume that xml datetime strings without offsets are in UTC, and not the local user's timezone. So all the times were getting unintentionally shifted. So `2010-11-10T09:00:00` turned into `Wed Nov 10 01:00:00 -0800 2010` when I really wanted `Wed Nov 10 09:00:00 -0800 2010` :-( – Chanpory Nov 11 '10 at 02:22
  • I'm getting a weird error when trying `doc = REXML::XPath.first(REXML::Document.new(xml), "//soap:Body/TimesInMyDAY").text`. The error is `REXML::UndefinedNamespaceException: Undefined prefix soap found` – Chanpory Nov 11 '10 at 04:47