1

Using regex expression, how can I retrieve only words, while ignoring any other symbols like commas, numbers, etc.?

val words = text.split("\b([-A-Za-z])+\b")

For example:

This is a nice day, my name is...

I want to get:

This, is, a, nice, day, my, name, is

while ignoring , and ....

ScalaBoy
  • 3,254
  • 13
  • 46
  • 84

3 Answers3

2

Split the string on non-letter:

val words = text.split("[^-A-Za-z]+")
Toto
  • 89,455
  • 62
  • 89
  • 125
  • Could you please explain what the symbol `+` means? – ScalaBoy Sep 29 '18 at 09:25
  • 1
    @ScalaBoy: It means 1 or more occurrence of preceding character. See https://stackoverflow.com/q/22937618/372239 and https://www.regular-expressions.info/ for more informations. – Toto Sep 29 '18 at 09:28
2

To extract all words including hyphenated words, you may use

"""\b[a-zA-Z]+(?:-[a-zA-Z]+)*\b""".r.findAllIn(s)

To support all Unicode letters, use \p{L} instead of the [a-zA-Z] character class:

val s = "This is a nice day, my name is..."
val res = """\b\p{L}+(?:-\p{L}+)*\b""".r.findAllIn(s)
println(res.toList)
// => List(This, is, a, nice, day, my, name, is)

See the Scala demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0
val p ="""[[a-z][A-Z]]+""".r

In REPL:

scala> val text = "This is a nice day, my name is..."
text: String = This is a nice day, my name is...

scala> p.findAllIn(text).toArray
res24: Array[String] = Array(This, is, a, nice, day, my, name, is)

scala> val text = "This is a nice_day, my_name is..."
text: String = This is a nice_day, my_name is...

scala> p.findAllIn(text).toArray
res26: Array[String] = Array(This, is, a, nice, day, my, name, is)
RAGHHURAAMM
  • 1,099
  • 7
  • 15