0

I'm using Rails 3.2.

I'm localizing my site in Romanian. In regular expressions, the regexp interval [a-z] should contain, in order, the following letters: a, ă, â, b, c etc.

Is there a way to tell my application that [a-z] should be the list above, based on my locale?

Also, there is an issue with capitalizing - "â".upcase doesn't result in "Â".

Or, maybe these features are not implemented yet in Rails?

sawa
  • 165,429
  • 45
  • 277
  • 381
George
  • 118
  • 1
  • 9
  • Have you looked into [transliteration](http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-transliterate)? – Paul Fioravanti Jun 05 '13 at 10:48
  • Isn't `upcase` Ruby's feature? Why is Rails relevant? – sawa Jun 05 '13 at 10:48
  • @PaulFioravanti That is irrelevant to this question. – sawa Jun 05 '13 at 10:49
  • @sawa, the reason I brought it up was for the potential for using an ascii-based regex with a post-ASCII-transliterated UTF-8 string, but I've never tried to see if that's a good way to solve the problem. Anyway, [this SO thread](http://stackoverflow.com/q/1910573/567863) may serve of some assistance to user1304740 regarding what's possible with i18n upcasing in Ruby. – Paul Fioravanti Jun 05 '13 at 11:01
  • @PaulFioravanti - thanks, but it's not applicable to my case (I don't want to get rid of non-Ascii characters). – George Jun 05 '13 at 12:13

2 Answers2

1

This is not a rails issue, [a-z] is not required to include non-ascii characters. In ruby's case, [a-z] represents a regex range matching consecutive ascii letters.

In ruby, String.upcase is not required to be locale-dependent. Instead, you can try using UnicodeUtils gem like so:

% gem install unicode_utils

#encoding: UTF-8
require 'unicode_utils'

p UnicodeUtils.upcase('ă', :ro)

"Ă"

Specifying locale when converting string case makes more sense, because for example:

 UnicodeUtils.upcase('i', :en) # is not equal to 
 UnicodeUtils.upcase('i', :tr)
nurettin
  • 11,090
  • 5
  • 65
  • 85
0

I think [a-z] sequence is based on the ASCII code number, so Romanian characters will not be taken into consideration. If you want to match any Latin character, use the character property of Onigmo:

"ă" =~ /\p{Latin}/
# => 0
sawa
  • 165,429
  • 45
  • 277
  • 381