3

I am trying to validate 'words' with Ruby 1.8.7.

My regex to catch a word is currently:

/[a-zA-Z]\'*\-*/

This will only catch English words; Is there a way to catch non-English UTF-8 characters?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
ethicalhack3r
  • 1,062
  • 3
  • 15
  • 16

1 Answers1

4

Even the 1.8.x Regex engine is UTF-8 aware, you just need to use the right expression, and it's slightly more than just using /\w/:

s = "résumé and some other words"
puts s[/[a-z]+/u]
puts s[/\w+/u]

and you get:

r
résumé
DigitalRoss
  • 143,651
  • 25
  • 248
  • 329