0

I am new to Ruby and writing simple Ruby (without Rails) program, basically just one script file, and seem unable to get user entered Cyrillic text, e.g.

puts gets.chomp

returns ??? instead of жжж, but works just fine with English letters.

puts gets.chomp.encoding and ruby -e 'p Encoding.default_external' return UTF-8.

On this test

a = gets.chomp 
puts a == 'жжж'
puts a == '???'

жжж input produced

false
true

So they are stored as question marks.

I'm using Windows 10 command prompt (or working in RubyMine, which uses it anyway). Problem persists in pry and irb.

I've tried putting #coding: UTF-8 at the beginning of the source file, didn't help. I've seen advise to edit some configuration files in Ruby on Rails project, but this is not my case.

Is there any simple solution?

wanderbild
  • 43
  • 5
  • 1
    First check: `puts gets.chomp.encoding`. What does it say? – Casper May 16 '20 at 23:56
  • @Casper `UTF-8` – wanderbild May 17 '20 at 00:23
  • 1
    What operating system and terminal are you using (my spidey sense says your terminal is the problem, not Ruby)? – Casper May 17 '20 at 00:31
  • Also check the output of this command: `ruby -e 'p Encoding.default_external'`. – Casper May 17 '20 at 00:41
  • What happens if you use `puts` in your script to output Cyrillic? If you still get question marks, then it might simply be a font issue. See the answer here, and you may also want to read the rest of the page for troubleshooting: https://stackoverflow.com/a/747437/823617 – Casper May 17 '20 at 14:08
  • @Casper it is able to output Cyrillic. And I've tried this: `a = gets.chomp` `puts a == 'жжж'` `puts a == '???'` and `жжж` input produced `false`, `true` So they are stored as question marks. – wanderbild May 17 '20 at 14:28
  • @Casper is probably right in that its a terminal issue and has little to do with ruby. – max May 17 '20 at 17:24
  • Thanks @Casper! You were right. – wanderbild May 28 '20 at 14:59

1 Answers1

0

I've found a partial solution, maybe someone will find it helpful.

  1. Change system locale in your Windows to Ukrainian/Belarusian/Russian etc (Control Panel -> Date and Time - > Change date and time -> Change calendar settings -> Administrative -> Change system locale).
  2. Now you've got hexadecimal \xNN\ output for Cyrillic input instead of question marks (like that: \xA2\xE7\xB4). I've found out that it's Windows encoding and by trial and error found the one that suits me - CP866, however, it lacks some characters. So passed string wasn't actually UTF, and we need to force its real encoding explicitly.
  3. I've written this method to get a real UTF-8 string from my gets input, which converts CP866 string to UTF-8 char by char (require 'iconv').
def decode(string)
 string.force_encoding('CP866')
 Iconv.iconv('utf-8','ibm866', string).join('')
end

Another workaround I'm using is letting user enter UTF input in default notepad app, and then read it from the file, after user pressed some key:

system %{cmd /c "start #{file_to_open}"}
gets
input = File.open('file_to_open').read
wanderbild
  • 43
  • 5