Ruby: UTF-8 incorrect input

Question

I have a .rb file that when run takes a string input for UTF-8, but for some reason the input is modified automatically. Here is an example of what my code looks like:

# encoding :UTF-8
.
.
.
print "Enter a UTF-8 input: "
text = gets.chomp
p text

So, if I input "\n\u001C\u0018\t\u001C", it prints out "\\n\\u001C\\u0018\\t\\u001C" which is not what I inputted! Curious as I was, I compared the lengths, and it is the same 22. But, I know it is modified because when I run the text through a function in the same file, it reads it as the second one. I know this because when I ran my actual code through irb, it works as intended, but when I run it from the file, it doesn't do what I want.

EDIT: Sean answered the question I had about the printing, but it doesn't explain why when I use the value in text for a function within the same ruby file, it does not see it as it should. In other words, the function works perfectly on irb when I physically input the UTF string. So, if I input "\t\u001C\u001C".xor "key" to the function below, the result should be "bye". Once again, this works in irb, but it doesn't work when I run it from a file! When I run it from the file, it gives me a "'*': negative argument (ArgumentError)" when I don't get any errors running it from irb! Below is the function:

class String
  def xor(key)
    text = dup
    b1 = text.unpack("U*")
    b2 = key.unpack("U*")
    longest = key.length #[b1.length,b2.length].max
    b1 = [0]*(longest-b1.length) + b1
    b2 = [0]*(longest-b2.length) + b2
    result = b1.zip(b2).map{ |a,b| a^b }
    result.pack("U*")
  end
end

Your code results in a "unknown encoding name: TF-8 (ArgumentError) " here (1..9.3). Try: `#encoding: UTF-8`. — steenslag, Apr 20 '12 at 13:53
I have edited the original post. Please take a look at EDIT, so you know exactly what problem I'm having. Thanks! — m10zart, Apr 26 '12 at 05:01
Take a look at this question, it seems to have what you need for the second part: http://stackoverflow.com/questions/7015778/is-this-the-best-way-to-unescape-unicode-escape-sequences-in-ruby this one too http://stackoverflow.com/questions/9230663/ruby-unescape-unicode-string — Sean, Apr 26 '12 at 14:53

Sean · Answer 1 · 2012-04-20T16:11:57.153

2

The reason this is happening is because you are using:

p text

vs

puts text

When you use p, ruby outputs the result of:

puts text.inspect

Which will show you the extra \'s in there that are being used as escape characters. If you just used puts you will see the expected result!

Cheers!

edited Apr 20 '12 at 16:11

answered Apr 20 '12 at 15:26

Sean

2,891
3
29
39

Thank you for your answer. It helped answer my first question, but it didn't answer my other question unfortunately. Please see my EDIT for my question! Once again, thanks! – m10zart Apr 26 '12 at 05:00

Ruby: UTF-8 incorrect input

1 Answers1