0

I have an xml that starts with

<?xml version='1.0' encoding='ISO-8859-8'?>

when I attempt to do

Hash.from_xml(my_xml)

I get a #<REXML::ParseException: No close tag for /root/response/message> (REXML::ParseException)

in the message tag there is indeed characters in the above encoding. I need to parse that XML, so I am guessing that I need to convert it all to utf-8 or something else that the parse will like.

Is there a way to do this? (other uses like with Nokogiri are also good)

Nick Ginanto
  • 31,090
  • 47
  • 134
  • 244

1 Answers1

2

Nokogiri seems to do the right thing:

# test.xml
<?xml version='1.0' encoding='ISO-8859-8'?>
<what>
  <body>דה</body>
</what>

xml = Nokogiri::XML(File.read 'test.xml')
puts xml.at_xpath('//body').content
# => "דה"

You can also tell Nokogiri what encoding to use (e.g., Nokogiri::XML(File.read('test.xml'), nil, 'ISO-8859-8')), but that doesn't seem to be necessary here.

If that doesn't help, you might want to check that your XML is well-formed.

You can then convert the XML to UTF-8 if you like:

xml2 = xml.serialize(:encoding => 'UTF-8') {|c| c.format.as_xml }

If you just want to convert your Nokogiri XML to a hash, take a look at some of the solutions here: Convert a Nokogiri document to a Ruby Hash, or you can just do: Hash.from_xml(xml2).

Community
  • 1
  • 1
Jacob Brown
  • 7,221
  • 4
  • 30
  • 50