I can't seem to find the right combination of String#encode
shenanigans.
2 Answers
I expect your script is using the encoding cp1251 and you have ruby >= 1.9.
Then you can use force_encoding
:
#encoding: cp1251
#works also with encoding: binary
source = 'I’d'
puts source.force_encoding('utf-8') #-> I’d
If my exceptions are wrong: Which encoding do you use and which ruby version?
A little background: Problems with encoding are difficult to analyse. There may be conflicts between:
- Encoding of the source code (That's defined by the editor).
- Expected encoding of the source code (that's defined with
#encoding
on the first line). This is used by ruby. - Encoding of the string (see e.g. section String encodings in http://nuclearsquid.com/writings/ruby-1-9-encodings/ )
- Encoding of the output shell

- 27,320
- 6
- 84
- 112
-
@ArupRakshit Is your output shell supporting UTF-8? And is your script code really encoded with cp1251? – knut Jan 12 '15 at 09:30
-
I just copy pasted your code and ran in my Vim.. and got the output as I said above. – Arup Rakshit Jan 12 '15 at 09:31
-
@ArupRakshit Can you the encoding you use in VIM (see http://stackoverflow.com/questions/778069/how-can-i-change-a-files-encoding-with-vim ) – knut Jan 12 '15 at 09:37
I think I'd got confused on this one so I'll post this here to hopefully help anyone else who is similarly confused.
I was trying to do my encoding in an irb
session, which gives you
irb(main):002:0> 'I’d'.force_encoding('UTF-8')
=> "I’d"
And if you try using encode
instead of force_encoding
then you get
irb(main):001:0> 'I’d'.encode('UTF-8')
=> "I’d"
This is with irb
set to use an output and input encoding of UTF-8. In my case to convert that string the way I want it involves telling Ruby that the source string is in windows-1252
encoding. You can do this by using the -E
argument in which you specify `inputencoding:outputencoding' and then you get this
$ irb -EWindows-1252:UTF-8
irb(main):001:0> 'I’d'
=> "I\xC3\xA2\xE2\x82\xAC\xE2\x84\xA2d"
That looks wrong unless you pipe it out, which gives this
$ ruby -E Windows-1252:UTF-8 -e "puts 'I’d'"
I’d
Hurrah. I'm not sure about why Ruby showed it as "I\xC3\xA2\xE2\x82\xAC\xE2\x84\xA2d"
(something to do with the code page of the terminal?) so if anyone can comment with further insight that would be great.

- 1,321
- 15
- 30