38

How does the magic comment in ruby​​ works? I am talking about:

# Encoding: utf-8

Is this a preprocessing directive? Are there other uses of this type of construction?

Patrick Oscity
  • 53,604
  • 17
  • 144
  • 168
Leszek Andrukanis
  • 2,114
  • 2
  • 19
  • 30

4 Answers4

62

Ruby interpreter instructions at the top of the source file - this is called magic comment. Before processing your source code interpreter reads this line and sets proper encoding. It's quite common for interpreted languages I believe. At least Python uses the same approach.

You can specify encoding in a number of different ways (some of them are recognized by editors):

# encoding: UTF-8
# coding: UTF-8
# -*- coding: UTF-8 -*-

You can read some interesting stuff about source encoding in this article.

The only thing I'm aware of that has similar construction is shebang, but it is related to Unix shells in general and is not Ruby-specific.

magic_comments defined in ruby/ruby

BinaryButterfly
  • 18,137
  • 13
  • 50
  • 91
KL-7
  • 46,000
  • 9
  • 87
  • 74
  • 1
    In some ways, this construction is similar to a ["magic number"](http://en.wikipedia.org/wiki/Magic_number_(programming\)). The term "magic comment" seems related. – Benjamin Oakes Oct 03 '13 at 20:15
  • 2
    It is "magic" in the way, that comments are usually completely ignored by the interpreter. However, the presence/absence of this comment has a meaning, therefore it is "magic", since it is NOT ignored by the interpreter. – NobodysNightmare Apr 12 '16 at 12:25
  • "All Ruby scripts now default to UTF-8 encoding" from https://www.engineyard.com/blog/whats-new-and-awesome-in-ruby-2 – Alessandro De Simone Feb 07 '18 at 16:20
  • though i'd note that means that they are assumed to be utf-8 encoding. If a text editor saves it as 8859-1 (as many do by default), and there is a symbol like £ in there then ruby will give an error, so the file must be saved/encoded as utf-8 by the text editor when there are characters in there that are outside us ascii. – barlop Feb 22 '18 at 23:16
15

This magic comment tells Ruby the source encoding of the currently parsed file. As Ruby 1.9.x by default assumes US_ASCII you have tell the interpreter what encoding your source code is in if you use non-ASCII characters (like umlauts or accented characters).

The comment has to be the first line of the file (or below the shebang if used) to be recognized.

There are other encoding settings. See this question for more information.

Since version 2.0, Ruby assumes UTF-8 encoding of the source file by default. As such, this magic encoding comment has become a rarer sight in the wild if you write your source code in UTF-8 anyway.

Community
  • 1
  • 1
Holger Just
  • 52,918
  • 14
  • 115
  • 123
8

As you noted, magic comments are a special preprocessing construct. They must be defined at the top of the file (except, if there is already a unix shebang at the top). As of Ruby 2.3 there are three kinds of magic comments:

  • Encoding comment: See other answers. Must always be the first magic comment. Must be ASCII compatible. Sets the source encoding, so you will run into problems if the real encoding of the file does not match the specified encoding
  • frozen_string_literal: true: Freezes all string literals in the current file
  • warn_indent: true: Activates indentation warnings for the current file

More info: Magic Instructions

J-_-L
  • 9,079
  • 2
  • 40
  • 37
0

While this isn't exactly an answer for your question, if you want to read more about encodings, how they work, what kinds of problems crop up with them: the great Yehuda Katz wrote about encodings as they were being worked out in Ruby 1.9 and beyond:

Ruby 1.9 Encodings: A Primer and the Solution for Rails

Encodings, Unabridged

steve
  • 3,276
  • 27
  • 25