0

I have a UTF-8 string in my Ruby code. Due to limitations I want to convert the UTF-8 characters in that string to either their escaped equivalents (such as \u23) or simply convert the whole string to UCS-2. I need to explicitly do this to export the data to a file

I tried to do the following in IRB:

my_string = '7.0mΩ'
my_string.encoding
my_string.encode!(Encode::UCS_2BE)
my_string.encoding

The output of that is:

=> "7.0mΩ"
=> #<Encoding::UTF-8>
=> "7.0m\u2126"
=> #<Encoding::UTF-16BE>

This seemed to work fine (I got "ohm" as 2126) until I was reading data out of an array (in Rails):

data.each_with_index do |entry, idx|
  puts "#{idx} !! #{entry['title']} !! #{entry['value']} !! #{entry['value'].encode!(Encoding::UCS_2BE)}"
end

That results in the error:

incompatible character encodings: UTF-8 and UTF-16BE

I then tried to write a basic file conversion routine:

File.open(target, 'w', encoding: Encoding::UCS_2BE) do |file|
  File.open(source, 'r', encoding: Encoding::UTF_8).each_line do |line|
    output.puts(line)
  end
end

This resulted in all kinds of weird characters in the file.

Not sure what is going wrong.

Is there a better way to approach this problem of converting UTF-8 data to UCS-2 in Ruby? I really wouldn't mind this actually being changed in the string to \u2126 as a literal part of the string rather than the actual value.

Help!

Temporary Workaround

I monkey-patched this to do what I want. It's not very elegant, but it does the job (and yes, I know it's not pretty... it's just a hack to get what I need):

def hacky_encode
  encoded = self
  unless encoded.ascii_only?
    encoded = scan(/./).map do |char|
      char.ascii_only? ? char : char.unpack('U*').map { |i| '\\u' + i.to_s(16).rjust(4, '0') }
    end.join
  end
  encoed
end

Which can be used:

"7.0mΩ".hacky_encode
el n00b
  • 1,957
  • 7
  • 37
  • 64
  • It's not ruby, but what about using `iconv`? (https://stackoverflow.com/questions/64860/best-way-to-convert-text-files-between-character-sets?rq=1) – Jared Beck Oct 25 '18 at 16:27
  • Something like `string.unpack('U*').map { |i| "\\u" + i.to_s(16).rjust(4, '0') }.join` converts ALL characters, which is not what I'm looking for. – el n00b Oct 25 '18 at 16:28
  • @Jared unfortunately I can't use `iconv`. I get all kinds of weird characters as in the example above, but I cannot expect that to be installed on the machines it runs on. – el n00b Oct 25 '18 at 16:29
  • When you say "export the data to a file" what type of data, and what type of file? Can you just set the encoding in the file itself assuming it contains string data only? – lacostenycoder Oct 25 '18 at 18:41
  • Also, when you say "This resulted in all kinds of weird characters in the file." What are you using to read the file? Whatever it is will need to deal with the encoding you use when creating the file. So there's that. What is the intended purpose of the encoded file? – lacostenycoder Oct 25 '18 at 18:47
  • @lacostenycoder it's data where I don't know what will be coming in as it comes from different sources (unfortunately). I was using just vi to try and read the files on my local machine. – el n00b Oct 26 '18 at 02:21
  • What you have not explained is "who is the intended reader of the file?" That should determine what format you want to encode TO. You shouldn't care what it looks like in vim so long as the intended user of the file will use the encoding you chose to output to. – lacostenycoder Oct 26 '18 at 11:41
  • @lacostenycoder it's intended to be machine readable but the file cannot be read on the opposing end of the spectrum (which I am assuming is a Microsoft SQL Server or C# implementation, I cannot be sure). – el n00b Oct 29 '18 at 15:22

0 Answers0