15

I want to port a rails app from Ruby 1.8.7 to 1.9.2. Some of the files contain umlauts like ä/ö/ü both within strings and comments. The files were saved as UTF-8 but without a BOM (byte order mark) at the beginning.

As you might know, Ruby 1.9 refuses to parse these files, giving an invalid multibyte char (US-ASCII)

I was googling and reading a lot but the only solution to this seems to be to

  • insert a BOM or
  • insert # coding: utf-8

at the beginning of each file.

My editor of choice (gEdit) doesn't seem to insert a BOM. I also read that having a BOM is bad practice because it may break some editors, it also breaks shell scripts if you want to use the shebang notation.

EDIT: The BOM breaks the Ruby 1.8.7 parser, giving a syntax error, unexpected kEND, expecting $end (SyntaxError) for the file!

I tried forcing the external encoding with ruby -Eutf-8:utf-8 but this seems to be ignored when calling rake (I tried: /home/malte/.rvm/gems/ruby-1.9.2-p180/bin/rake test).

So my question is:

As RVM is building ruby 1.9 from source anyway, is there a build option or a patch to change the default encoding from US-ASCII to UTF-8?

I took a quick look at the source code but couldn't find the line where the default is set (I'm no C expert, tough).

Malte
  • 1,200
  • 10
  • 18

1 Answers1

30

I found a workaround: set the RUBYOPT environment variable, for example by executing

export RUBYOPT=-Ku

in your shell.

This will set -Ku als default option when calling ruby. You can now call all other tools which invoke ruby without worrying about parameters. rails server or rake works and regards all files as UTF-8. No BOM or magic comments necessary!

Malte
  • 1,200
  • 10
  • 18
  • Thanks a lot. This was the only way I could get my tests to run with UTF-8 chars in my factories. – Jesse Clark Jan 03 '12 at 16:06
  • 3
    Apparently, the -Ku option will be deprecated. The default_internal and default_external encodings work for input and output. However, they do NOT set the file encoding. AFAIK, file encoding can be set only with a BOM or magic comment. – jpgeek Sep 04 '12 at 09:57
  • 1
    @jpgeek No need to worry about deprecation; starting with ruby 2.0 the default source file encoding is UTF-8, so you won't even need `-Ku`. – Kelvin May 23 '14 at 19:57
  • @Kelvin I'm getting some of those errors on 2.2.0(-preview1), and -Ku fixed it, but I feel bad using something that's (apparently ?) deprecated... – Ven Oct 28 '14 at 09:13
  • @Ven I'm not sure what to tell you, because I don't have 2.2 installed. If you don't have the problem on 2.1 (installed on the same system & same environment), it might be a 2.2 bug. I'd recommend using a production-ready version unless you're willing to do deep troubleshooting. – Kelvin Oct 28 '14 at 19:23
  • It's fine for now, I guess. I'll do indeed try and get some more insights later... – Ven Oct 28 '14 at 21:25