3

I have the following file named test.rb encoding in UTF-16LE

# encoding: UTF-16LE

test = "test!"
p test

Running it with the following command produces no results

ruby ./test.rb

What am I missing here?


In case anyone is wondering, the reason I'm trying to set my source to UTF-16LE encoding is that I'm working with UTF-16LE input and output file encodings. My impression is that if I set encoding properly when I read in a file and set the encoding properly when I output and I have the # encoding: set properly in my source, everything should just work. If anyone sees anything wrong with this (or an easier way) feel free to let me know.

kubi
  • 48,104
  • 19
  • 94
  • 118
  • Writing your source code in an encoding has no bearing on what encoding you read or write a file. – the Tin Man Nov 21 '10 at 00:21
  • @Greg So what you're saying is that if my source code is UTF-8 and I write a string to a UTF-16LE file, it will be automatically converted to the proper encoding? – kubi Nov 21 '10 at 00:58
  • No. It won't be automatically converted. You have to tell Ruby what encoding your file I/O is in. See Mladen Jablanović's answer as he's pointing you in the right direction. – the Tin Man Nov 21 '10 at 01:08
  • 1
    To clarify: If I tell ruby to encode my output file as UTF-16, will all my strings be converted before they're written? String encoding is whatever the source file encoding is (unless it's specified), right? – kubi Nov 21 '10 at 01:30

1 Answers1

7

Writing your program in UTF-16 in order to process UTF-16 files sounds like naming your variables in Russian in order to make a Russian website. :)

Ruby 1.9 supports string encodings, and James Gray has an excellent series of articles on the topic - I consider them a reference guide to encodings in Ruby.

In short, you can specify the encoding of your input files when you open them:

s = ''
File.open('utf16le.txt', 'rb:UTF-16LE') do |f| # here you set the encoding
  s = f.read
end
p s.encoding
#=> #<Encoding:UTF-16LE>
p s.length
#=> 19
p s
#=> "test\nmladen\n\u0436\u045F\u0446\u0432\u0431\n\n"

Everything is also in the docs for 1.9 IO class:

http://ruby-doc.org/ruby-1.9/classes/IO.html

Mladen Jablanović
  • 43,461
  • 10
  • 90
  • 113
  • Any idea why my code doesn't work? My only guess is that UTF-16LE text isn't supported by terminal and doesn't display, but I would think it would output the \uXXX code at the very least. – kubi Nov 20 '10 at 23:37
  • In particular, "The Default External and Internal Encodings" section in James Gray's article is germane to the OP's question. – the Tin Man Nov 21 '10 at 01:18