33

I have a Rails project with a lot of Cyrillic strings in it.

It worked fine on Ruby 1.8, but Ruby 1.9 assumes source files are US-ASCII-encoded unless you provide an # encoding: utf-8 comment at the top of the source file. At that point the files are not considered US-ASCII.

Is there a simpler way to tell Ruby "This application is UTF8-encoded. Please consider all and any included source files as UTF8 unless declared otherwise"?


UPDATE:

I wrote "How to insert the encoding: UTF-8 directive automatically in Ruby 1.9 files" which appends the encoding directive automatically if it's needed.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Leonid Shevtsov
  • 14,024
  • 9
  • 51
  • 82
  • 2
    James Grey wrote [a series of articles](http://blog.grayproductions.net/articles/the_unicode_character_set_and_encodings) dealing with Unicode and Ruby. Handling source files was part of that series. It is good reading. – the Tin Man Jan 20 '12 at 15:36
  • 5 years later: Upgrade to ruby 2.0+ where the default is UTF-8 (https://www.ruby-lang.org/en/news/2013/02/24/ruby-2-0-0-p0-is-released/) – Jared Beck Feb 10 '15 at 02:31

7 Answers7

13

I think you can either

  1. use -E utf-8 command line argument to ruby, or
  2. set your RUBYOPT environment variable to "-E utf-8"
Mladen Jablanović
  • 43,461
  • 10
  • 90
  • 113
  • 1
    The suggested use only sets the external encoding. To set both external and internal encoding use `-E utf-8:utf-8`. – Clint Pachl Feb 28 '13 at 14:01
12

In my opinion, explicit is not always better than implicit.

When almost all the source you use is UTF-8 compatible, you can easily avoid putting the magic encoding comment by using Ruby's -Ku command-line options.

Don't confuse the "u" parameter of the -K options with -U options.

-Ku : set internal and script encoding to utf-8
-U  : set internal encoding to utf-8

Then, set the magic encoding comment only in scripts that need it. Remember, convention over configuration!

You can set the environment variable RUBYOPT=-Ku

See Ruby's command-line options at http://www.manpagez.com/man/1/ruby/.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Totor
  • 121
  • 1
  • 4
4

Explicit is better than implicit. Writing out the name of the encoding is good for your text editor, your interpreter, and anyone else who wants to look at the file. Different platforms have different defaults -- UTF-8, Windows-1252, Windows-1251, etc. -- and you will either hamper portability or platform integration if you automatically pick one over the other. Requiring more explicit encodings is a Good Thing.

It might be a good idea to integrate your Rails app with GetText. Then all of your UTF-8 strings will be isolated to a small number of translation files, and your Ruby modules will be clean ASCII.

Josh Lee
  • 171,072
  • 38
  • 269
  • 275
  • +1 Splitting out the non-ASCII strings into a separate file is a great way to handle the issue. Or, put them into a table into a DB or YAML file, making it easier to tweak the language without having to touch source code. – the Tin Man Jan 20 '12 at 15:25
  • 2
    Requiring explicit encoding is good. However, Ruby is implicitly choosing ASCII-8BIT for all files. It should be possible to explicitly set a different default. For applications using entirely UTF-8 strings for example, it does not make sense to have to include a magic comment in every file, yes? – jpgeek Sep 04 '12 at 07:13
4

There's a gem that sets the magic comment on top on every file that needs it in a Rails project : https://github.com/m-ryan/magic_encoding

You just install it and run magic_encoding in the root of your project, problem solved.

3

Not a direct answer, but depending on your coding environment you can let the editor take care of things. Emacs' ruby-mode for example has the variable ruby-insert-encoding-magic-comment:

ruby-insert-encoding-magic-comment is a variable defined in `ruby-mode.el' Its value is t

Documentation: *Insert a magic emacs 'coding' comment upon save if this is non-nil.

You can customize this variable.

I'm sure there's something similar for other editors. Sure, it still means adding the magic comment to every file, but at least the editor does it for you automatically instead of you having to remember.

Michael Kohl
  • 66,324
  • 14
  • 138
  • 158
1

The only foolproof (and DRY!) 1.9 way of ensuring that all your files (source and assets) are loaded with your preferred encoding at run-time is to use the -E command line argument.

All the other approaches have drawbacks depending on your system (e.g. impossible to set ENV vars, third-party code loaded first making unsuitable to use Encoding.default_external, ...).

My production servers use the following wrapper script:

#!/bin/bash
exec /usr/local/rvm/rubies/default/bin/ruby -E utf-8:utf-8 "$@"

(make sure to adapt the path)

Arnaud Meuret
  • 985
  • 8
  • 26
0

I don't run into this much, but when I need to ensure UTF-8, I use the $KCODE global. Try putting this in your environment.rb: $KCODE = 'UTF8'

Also, are you certain that your editor is saving files in UTF-8?

Brian
  • 6,820
  • 3
  • 29
  • 27