2

I'm a beginner working on a simple Ruby program to generate vocabulary lists from text files. Spanish allows words to carry stress marks on capitalized first letters (e.g. "Ábaco"), but I want all words in my list to be downcased. Right now, if I try "Á".downcase the console returns "Á".

Is there a way to use upcase & downcase in ruby with accented characters in Spanish (áéíóúñ)?

This is what my program presently looks like:

f = File.open(".../cat.txt")
words = f.read.split.map(&:downcase)
f.close
words = words.map {|item| item.gsub(/[,.?!-"'"]/, '')}
words = words.uniq.sort

File.open(".../catwords.txt", "w+") do |f|
  words.each { |element| f.puts(element) }
end
gonzalo2000
  • 628
  • 1
  • 8
  • 22
  • Ruby 2.4+ now supports more Unicode case mappings (upcase / downcase) and `"Á".downcase` now returns `"á"`. – dlauzon Mar 14 '22 at 14:40

2 Answers2

1

Have a look at this sample code

our_string.tr('Á', 'á')

As per the documentation:

(from ruby site) ------------------------------------------------------------------------------ str.tr(from_str, to_str) => new_str


Returns a copy of str with the characters in from_str replaced by the corresponding characters in to_str. If to_str is shorter than from_str, it is padded with its last character in order to maintain the correspondence.

"hello".tr('el', 'ip') #=> "hippo"

```

vgoff
  • 10,980
  • 3
  • 38
  • 56
1

You would need a library that understand language-specific rules for stuff like ordering and transformation. https://github.com/jchris/icu4r is probably the main one, but you'll find similar stuff if you search around for ICU (the standards project for this kind of thing).

coderanger
  • 52,400
  • 4
  • 52
  • 75