12

I have just started using RSpec and I copied the very simple test on the RSpec github repo just to make sure things are working as expected:

require 'spec_helper'

describe 'Home Page' do
  it "Welcomes the user" do
    visit '/products'
    page.should have_content("Welcome")
  end
end

The problems begin when I change the string to something like "Olá" or "Caçamba". Any string with a special character. When I do that, I get the following error:

invalid multibyte char (US-ASCII) (SyntaxError)
invalid multibyte char (US-ASCII)
syntax error, unexpected $end, expecting ')'
page.should have_content("Olá")

Any ideas on how to fix it? Maybe some configuration option? Thanks a lot

avlnx
  • 676
  • 1
  • 6
  • 19

2 Answers2

29

Most likely you're missing a magic comment at the top of your file:

# encoding: UTF-8

Without this directive Ruby tries to interpret your file with the default US-ASCII encoding and fails since this character set does not include symbols like á or ç.

Here's a blog post on default source encoding in Ruby by James Edward Gray II.

Community
  • 1
  • 1
KL-7
  • 46,000
  • 9
  • 87
  • 74
0

International characters almost always use values outside of the range of US-ASCII, which is only the english alphabet, numbers and a small set of symbols that you find on your keyboard (if you use a US keyboard). Characters with accents, fanciness, or that aren't even characters (eg. emoticons) are represented in more than one byte, which is all that is used to represent US-ASCII. The mappings of numerical value to character is callen an encoding. After US-ASCII, there's ISO-8891-1, which adds accents to the file (mostly Spanish, French, Swedish, etc.) (eg: é, å, ü, etc.). After that, you get Unicode, which includes things like ˝, ‰, Ó, ˆ, ◊, or almost any symbol you can think of in any language.

Ruby, by default, has the encoding of a program and all of the strings in it as US-ASCII. You can either change the encoding of the entire file (and everything in it) with the magic comment (see @KL-7's answer) or you can change it on a string by string basis:

"Olé".force_encoding("ISO-8891-1")

Ruby also supports an imaginary encoding called ASCI 8-bit, which is essentially binary data with no encoding.

Linuxios
  • 34,849
  • 13
  • 91
  • 116
  • 1
    I think `force_encoding` might help to work with the string later, but it won't help the interpreter to parse the original string literal. Am I right? Here is a related [article](http://blog.grayproductions.net/articles/ruby_19s_string) on strings encoding in Ruby. – KL-7 Jul 04 '12 at 14:46