49

I've been reading up on all the UTF-8 related questions and blog posts, and I've got the following example in a test.rb file:

# encoding: UTF-8
File.open("test.txt", "w") do |f|
  f.write "test © foo"
end

File.open("test.txt", "r") do |f|
  puts f.read
end

this works perfectly. is produces the © symbol correctly in the file, and it reads the © back to me and prints it on the screen.

but when I use this same code in my actual project, i get this written to the file instead of the © symbol: \u00A9

FWIW: I'm getting this result when running an rspec (v1.2.9) test against my code. the spec produces a file with a © symbol in it, and then reads the file back in to check the contents.

I'm running this in Ruby 1.9.2 at the moment, but I also need to support all the way back to Ruby 1.8.6. This is a Windows environment with RubyInstaller.org versions of Ruby.

Phrogz
  • 296,393
  • 112
  • 651
  • 745
Derick Bailey
  • 72,004
  • 22
  • 206
  • 219

3 Answers3

53

If i execute your code i get an error on the special character. Can you try this code ?

# encoding: UTF-8
File.open("test.txt", "w:UTF-8") do |f| 
  f.write "test \u00A9 foo" 
end 

#Encoding.filesystem = "UTF-8"
p Encoding.find("filesystem") 
File.open("test.txt", "r:UTF-8") do |f| 
  puts f.read 
end 

On my windows box i then get

#<Encoding:Windows-1252>
test © foo

I have no idea why the  is there..

peter
  • 41,770
  • 5
  • 64
  • 108
  • 3
    What console are you running ruby in? If it's windows command prompt, it doesn't understand UTF-8, so your UTF-8 output is being displayed by a Windows CP-1252 application. – Matt Connolly Dec 12 '13 at 23:47
  • 1
    Note: you can read UTF-8 more tersely with `text = File.open(filename,'r:UTF-8',&:read)` – Phrogz Feb 14 '14 at 17:51
  • In Windows you can change codepage to understand utf8 with `chcp 65001` but still some Asian symbols will not be [reproduced](https://stackoverflow.com/questions/67746540/how-to-read-a-file-in-utf8-encoding-and-output-in-windows-10) properly. – Polar Bear May 29 '21 at 00:10
48

Read the file with less code:

# encoding: UTF-8
file_content = File.open("test.txt", "r:UTF-8", &:read)
Phrogz
  • 296,393
  • 112
  • 651
  • 745
tokhi
  • 21,044
  • 23
  • 95
  • 105
4

On which OS does your application run? It could be that the default encoding for the file is ASCII. Does it help if you add w:utf-8 and r:utf-8 to the open parameters?

asgs
  • 3,928
  • 6
  • 39
  • 54
ALoR
  • 4,904
  • 2
  • 23
  • 25
  • I tried the encoding hints like you suggest, and that didn't seem to make a difference when running the RSpec tests. I updated my question to include ruby version / platform info, too. maybe i need to upgrade to rspec 2.x – Derick Bailey Mar 02 '11 at 13:54