0

I am trying to use the rmmseg-cpp gem's sample code documented here: http://rmmseg-cpp.rubyforge.org/#Stand-Alone-rmmseg

Just to test it out I put it in show.html.erb like this:

# coding: UTF-8
<p id="notice"><%= notice %></p>

<p>
  <b>Title:</b>
  <%= @lesson.title %>
</p>

<p>
  <b>Content:</b>
  <%= @lesson.content %> # simplified chinese text
</p>

<p><% require 'rmmseg' %>
<% algor = RMMSeg::Algorithm.new(@lesson.content) %>
<% loop do %>
  <% tok = algor.next_token %>
  <% break if tok.nil? %>
  <%= "#{tok.text} [#{tok.start}..#{tok.end}]" %>
<% end %> </p>

<%= link_to 'Edit', edit_lesson_path(@lesson) %> |
<%= link_to 'Back', lessons_path %>

I get the following error:

 Encoding::CompatibilityError in Lessons#show

Showing /Users/webmagnets/rails_projects/blt/app/views/lessons/show.html.erb where line #19 raised:

incompatible character encodings: UTF-8 and ASCII-8BIT

Extracted source (around line #19):

16: <% loop do %>
17:   <% tok = algor.next_token %>
18:   <% break if tok.nil? %>
19:   <%= "#{tok.text} [#{tok.start}..#{tok.end}]" %>
20: <% end %> </p>
21: 
22: <%= link_to 'Edit', edit_lesson_path(@lesson) %> |

Rails.root: /Users/webmagnets/rails_projects/blt
Application Trace | Framework Trace | Full Trace

app/views/lessons/show.html.erb:19:in `block in _app_views_lessons_show_html_erb___3831310028264182552_70339844987120'
app/views/lessons/show.html.erb:16:in `loop'
app/views/lessons/show.html.erb:16:in `_app_views_lessons_show_html_erb___3831310028264182552_70339844987120'
app/controllers/lessons_controller.rb:20:in `show'

Request

Parameters:

{"id"=>"1"}

Show session dump

Show env dump
Response

Headers:

None

If you need any more info, please let me know.

webmagnets
  • 2,266
  • 3
  • 33
  • 60
  • I have already tried what was mentioned on this page: http://stackoverflow.com/questions/11682351/incompatible-character-encodings-ascii-8bit-and-utf-8-while-using-javascript-in – webmagnets Dec 14 '12 at 13:27
  • I assumed that the Ruby magic comment was case-sensitive but it turns out to be different. [James Edward Gray's post on the magic comment](http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings) is all I can refer you to. – Thomas Klemm Dec 14 '12 at 13:38
  • Maybe, if you copied and pasted some code on your editor, the clipboard content came with some messed up characters. Try deleting and inserting spaces, quotes, commas manually. – MurifoX Dec 14 '12 at 13:41
  • Could it be that you copied and pasted some code into your editor in the making? (see [this post](http://stackoverflow.com/questions/5286117/incompatible-character-encodings-ascii-8bit-and-utf-8)) – Thomas Klemm Dec 14 '12 at 13:42
  • I did copy and paste code, but after seeing these comments I retyped the parts that were pasted and I also opened the file in textwrangler and I think I have made sure that the text is all UTF-8 encoded. – webmagnets Dec 14 '12 at 14:03

2 Answers2

5

This link helped me: https://github.com/sinatra/sinatra/issues/559#issuecomment-7748296

I used <% text = tok.text.force_encoding 'UTF-8' %> and it worked.

Thanks @zed_0xff for putting me on the right path.

webmagnets
  • 2,266
  • 3
  • 33
  • 60
1

try this workaround

<% text = tok.text.encode('utf-8',:invalid => :replace, :undef => :replace) %>
<%= "#{text} [#{tok.start}..#{tok.end}]" %>
zed_0xff
  • 32,417
  • 7
  • 53
  • 72
  • Thanks. That gets rid of the error, BUT it isn't working properly yet. Here is my code now: ```<% text = "你好" %> <% algor = RMMSeg::Algorithm.new(text) %> <% loop do %> <% tok = algor.next_token %> <% break if tok.nil? %> <% text2 = tok.text.encode('utf-8',:invalid => :replace, :undef => :replace) %> <%= "#{text2} [#{tok.start}..#{tok.end}]" %> <% end %>``` It displays diamonds with question marks instead of Chinese characters. Also, if I remove ```,:invalid => :replace, :undef => :replace``` then the error message is `"\xE4" from ASCII-8BIT to UTF-8` – webmagnets Dec 14 '12 at 14:55