I have a UTF-8 string in my Ruby code. Due to limitations I want to convert the UTF-8 characters in that string to either their escaped equivalents (such as \u23
) or simply convert the whole string to UCS-2. I need to explicitly do this to export the data to a file
I tried to do the following in IRB:
my_string = '7.0mΩ'
my_string.encoding
my_string.encode!(Encode::UCS_2BE)
my_string.encoding
The output of that is:
=> "7.0mΩ"
=> #<Encoding::UTF-8>
=> "7.0m\u2126"
=> #<Encoding::UTF-16BE>
This seemed to work fine (I got "ohm" as 2126) until I was reading data out of an array (in Rails):
data.each_with_index do |entry, idx|
puts "#{idx} !! #{entry['title']} !! #{entry['value']} !! #{entry['value'].encode!(Encoding::UCS_2BE)}"
end
That results in the error:
incompatible character encodings: UTF-8 and UTF-16BE
I then tried to write a basic file conversion routine:
File.open(target, 'w', encoding: Encoding::UCS_2BE) do |file|
File.open(source, 'r', encoding: Encoding::UTF_8).each_line do |line|
output.puts(line)
end
end
This resulted in all kinds of weird characters in the file.
Not sure what is going wrong.
Is there a better way to approach this problem of converting UTF-8 data to UCS-2 in Ruby? I really wouldn't mind this actually being changed in the string to \u2126
as a literal part of the string rather than the actual value.
Help!
Temporary Workaround
I monkey-patched this to do what I want. It's not very elegant, but it does the job (and yes, I know it's not pretty... it's just a hack to get what I need):
def hacky_encode
encoded = self
unless encoded.ascii_only?
encoded = scan(/./).map do |char|
char.ascii_only? ? char : char.unpack('U*').map { |i| '\\u' + i.to_s(16).rjust(4, '0') }
end.join
end
encoed
end
Which can be used:
"7.0mΩ".hacky_encode