2

I'm trying to import an XML file via a web page in a Ruby on Rails application, the code ruby view code is as follows (I've removed HTML layout tags to make reading the code easier)

<% form_for( :fmfile, :url => '/fmfiles', :html => { :method => :post, :name => 'Form_Import_DDR', :enctype => 'multipart/form-data' } ) do |f| %>
<%= f.file_field :document, :accept => 'text/xml', :name => 'fmfile_document' %>
<%= submit_tag 'Import DDR' %>
<% end %>

Results in the following HTML form

<form action="/fmfiles" enctype="multipart/form-data" method="post" name="Form_Import_DDR"><div style="margin:0;padding:0"><input name="authenticity_token" type="hidden" value="3da97372885564a4587774e7e31aaf77119aec62" />
<input accept="text/xml" id="fmfile_document" name="fmfile_document" size="30" type="file" />
<input name="commit" type="submit" value="Import DDR" />
</form>

The Form_Import_DDR method in the 'fmfiles_controller' is the code that does the hard work of reading the XML document in using REXML. The code is as follows

@fmfile = Fmfile.new
@fmfile.user_id = current_user.id
@fmfile.file_group_id = 1
@fmfile.name = params[:fmfile_document].original_filename

respond_to do |format|
  if @fmfile.save
    require 'rexml/document'
    doc = REXML::Document.new(params[:fmfile_document].read)

    doc.root.elements['File'].elements['BaseTableCatalog'].each_element('BaseTable') do |n|
      @base_table = BaseTable.new
      @base_table.base_table_create(@fmfile.user_id, @fmfile.id, n)
    end

And it carries on reading all the different XML elements in.

I'm using Rails 2.1.0 and Mongrel 1.1.5 in Development environment on Mac OS X 10.5.4, site DB and browser on same machine.

My question is this. This whole process works fine when reading an XML document with character encoding UTF-8 but fails when the XML file is UTF-16, does anyone know why this is happening and how it can be stopped?

I have included the error output from the debugger console below, it takes about 5 minutes to get this output and the browser times out before the following output with the 'Failed to open page'

Processing FmfilesController#create (for 127.0.0.1 at 2008-09-15 16:50:56) [POST]
Session ID: BAh7CDoMdXNlcl9pZGkGOgxjc3JmX2lkIiVmM2I3YWU2YWI4ODU2NjI0NDM2
NTFmMDE1OGY1OWQxNSIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxh
c2g6OkZsYXNoSGFzaHsABjoKQHVzZWR7AA==--dd9f588a68ed628ab398dd1a967eedcd09e505e0
Parameters: {"commit"=>"Import DDR", "authenticity_token"=>"3da97372885564a4587774e7e31aaf77119aec62", "action"=>"create", "fmfile_document"=>#<File:/var/folders/LU/LU50A0vNHA07S4rxDAOk4E+++TI/-Tmp-/CGI.3001.1>, "controller"=>"fmfiles"}
[4;36;1mUser Load (0.000350)[0m   [0;1mSELECT * FROM "users" WHERE (id = 1) LIMIT 1[0m
[4;35;1mFmfile Create (0.000483)[0m   [0mINSERT INTO "fmfiles" ("name", "file_group_id", "updated_at", "report_created_at", "report_link", "report_version", "option_on_open_account_name", "user_id", "option_default_custom_menu_set", "option_on_close_script", "path", "report_type", "option_on_open_layout", "option_on_open_script", "created_at") VALUES('TheTest_fp7 2.xml', 1, '2008-09-15 15:50:56', NULL, NULL, NULL, NULL, 1, NULL, NULL, NULL, NULL, NULL, NULL, '2008-09-15 15:50:56')[0m

REXML::ParseException (#<Iconv::InvalidCharacter: "਼䙍偒数 (followed by a few thousand similar looking chinese characters)
䙍偒数潲琾", ["\n"]>
/Library/Ruby/Site/1.8/rexml/encodings/ICONV.rb:7:in `conv'
/Library/Ruby/Site/1.8/rexml/encodings/ICONV.rb:7:in `decode'
/Library/Ruby/Site/1.8/rexml/source.rb:50:in `encoding='
/Library/Ruby/Site/1.8/rexml/parsers/baseparser.rb:210:in `pull'
/Library/Ruby/Site/1.8/rexml/parsers/treeparser.rb:21:in `parse'
/Library/Ruby/Site/1.8/rexml/document.rb:190:in `build'
/Library/Ruby/Site/1.8/rexml/document.rb:45:in `initialize'
Cœur
  • 37,241
  • 25
  • 195
  • 267
Matt Haughton
  • 2,919
  • 3
  • 24
  • 30

4 Answers4

1

Rather than a rails/mongrel problem, it sounds more likely that there's an issue either with your XML file or with the way REXML handles it. You can check this by writing a short script to read your XML file directly (rather than within a request) and seeing if it still fails.

Assuming it does, there are a couple of things I'd look at. First, I'd check you are running the latest version of REXML. A couple of years ago there was a bug (http://www.germane-software.com/projects/rexml/ticket/63) in its UTF-16 handling.

The second thing I'd check is if you're issue is similar to this: http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/ba7b0585c7a6330d. If so you can try the workaround in that thread.

If none of the above helps, then please reply with more information, such as the exception you are getting when you try and read the file.

tomafro
  • 5,788
  • 3
  • 26
  • 22
0

Have you tried doing this using JRuby? I've heard Unicode strings are better supported in JRuby.

One other thing you can try is to use another XML parsing library, such as libxml ou Hpricot.

REXML is one of the slowest Ruby XML libraries you can use and might not scale.

Dema
  • 6,867
  • 11
  • 40
  • 48
0

Actually, I think your problem may be related to the problem I just detailed in this post. If I were you, I'd open it up in TextPad in Binary mode and see if there are any Byte Order Marks before your XML starts.

Community
  • 1
  • 1
George Stocker
  • 57,289
  • 29
  • 176
  • 237
0

Since getting this to work requires me to only change the encoding attribute of the first XML element to have the value UTF-8 instead of UTF-16, the XML file is actually UTF-8 and labelled wrongly by the application that generates it.

The XML file is a FileMaker DDR export produced by FileMaker Pro Advanced 8.5 on OS X 10.5.4

Matt Haughton
  • 2,919
  • 3
  • 24
  • 30