17

Is this an acceptable approach for removing multiple character types from a string or is there a better (more efficient way)? The "ilr".contains(_) bit feels a little like cheating considering it will be done for each and every character, but then again, maybe this is the right way. Is there a faster or more efficient way to do this?

val sentence = "Twinkle twinkle little star, oh I wander what you are"

val words = sentence.filter(!"ilr".contains(_))   

// Result: "Twnke twnke tte sta, oh I wande what you ae"
Jack
  • 16,506
  • 19
  • 100
  • 167

2 Answers2

34

I'd just use Java's good old replaceAll (it takes a regexp):

"Twinkle twinkle little star, oh I wander what you are" replaceAll ("[ilr]", "")
// res0: String = Twnke twnke tte sta, oh I wande what you ae

In contrast to working with chars (as in filtering a Seq[Char]), using regular expressions should be Unicode-safe even if you're working with code points outside the basic multilingual plane. "There Ain't No Such Thing As Plain Text."

  • You might have a point there on "outside the BMP". But if you care, you better get busy testing — there's *almost* no such thing as BMP-safe Java software (http://stackoverflow.com/a/2533118/53974). Luckily, the SDK is apparently an exception, if you use the right APIs - and regexps are among the blessed ones. http://www.oracle.com/us/technologies/java/supplementary-142654.html – Blaisorblade Mar 09 '14 at 11:50
32

There would be no significant difference, since there is only 3 characters to remove and no so big string to filter, but you may consider to use Set for this purpose. E.g.

val toRemove = "ilr".toSet
val words = sentence.filterNot(toRemove)
om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
  • 1
    Thanks, this version is faster though: val toRemove = List( 'i', 'l', 'r'); val words = sentence.filterNot(toRemove.contains(_)) – alex May 25 '20 at 22:29