I am working on a rails 3.1 app with ruby 1.9.3 and mongoid as my ORM. I am facing an annoying issue. I would like to truncate the content of a post like this:
<%= raw truncate(strip_tags(post.content), :length => 200) %>
I am using raw
and strip_tags
because my post.content
is actually handled with a rich text editor.
I have a serious issue with non ASCII characters. Imagine my post content is the following:
éééé éééé éééé éééé éééé éééé éééé éééé
What I am doing above in a naive way does this:
éééé éééé éééé éééé éééé &eac...
Looks like truncate is seeing every word of the string like é&eactute;éé
.
Is there a way to either:
- Have truncate handle an actual UTF-8 strings, where 'é' stands for a single character ? That would be my favorite approach.
- Hack the above instruction such that the result is better, like force rails to truncate between 2 words,
I am asking this question because I have not found any solution so far. This is the only place in my app where I have problems with such character, and it is a major issues since the whole content of the website is in french, so contains a lot of é, ç, à, ù
.
Also, I think this behavior is quite unfortunate for the truncate
helper because in my case it does not truncate 200 characters at all, but approximately 25 characters !