7

I am currently working with Scanners and Parsers and need a Parser that accepts characters that are ASCII letters - so I can't use char.isLetter.

I came up with two solutions myself. I don't like both of them.

Regex

def letter = elem("ascii letter", _.toString.matches("""[a-zA-Z]"""))

This seems rather "overkill" to check such a simple thing with a regex.

Range check

def letter = elem("ascii letter", c => ('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z'))

In my opinion, this would be the way to go in Java. But it's not really readable.

Is there a cleaner, more Scala-like solution to this problem? I do not really worry about performance, as it doesn't matter in this case.

Charles
  • 50,943
  • 13
  • 104
  • 142
r0estir0bbe
  • 699
  • 2
  • 7
  • 23
  • 4
    I think the regular expression is fine. If worried about performance, simply create/keep the regular expression object .. otherwise, provide a performance test-case. Simple regular expressions are *fast* (even with the toString) to apply; they can degenerate with backtracking, which is not applicable here. –  Mar 15 '13 at 18:39
  • I just find that regular expression not elegant at all. When working with Scala, it feels like you can do so many things really nice. But it doesn't seem to be the case with this one. – r0estir0bbe Mar 15 '13 at 18:55
  • I find the regular expression elegant because it is a domain-specific language well-suited to this particular task: describing a character-based grammar that particular string input must adhere to. There are many things regular expressions are *not* suited for but, barring an existing method or *known* performance issues, I would use a regular expression and not think twice about it. –  Mar 15 '13 at 19:00

4 Answers4

18

You say you can't use Char.isLetter because you only want ASCII letters. Why not just restrict it to the 7-bit ASCII character range?

def isAsciiLetter(c: Char) = c.isLetter && c <= 'z'

If the reader wants to check for ASCII including non-letters then:

def isAscii(c: Char) = c.toInt <= 127
samthebest
  • 30,803
  • 25
  • 102
  • 142
DaoWen
  • 32,589
  • 6
  • 74
  • 101
  • 3
    Compare this elegance to the java thread lol: http://stackoverflow.com/questions/3585053/in-java-is-it-possible-to-check-if-a-string-is-only-ascii – samthebest Aug 19 '16 at 11:56
2

Regardless of what you choose in the end, I suggest abstracting out the definition of "is an ASCII letter" for readability and performance. E.g.:

object Program extends App {
  implicit class CharProperties(val ch: Char) extends AnyVal {
    def isASCIILetter: Boolean =
      (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')
  }
  println('x'.isASCIILetter)
  println('0'.isASCIILetter)
}

Or if you want to describe ASCII letters as a set:

object Program extends App {
  object CharProperties {
    val ASCIILetters = ('a' to 'z').toSet ++ ('A' to 'Z').toSet
  }
  implicit class CharProperties(val ch: Char) extends AnyVal {
    def isASCIILetter: Boolean =
      CharProperties.ASCIILetters.contains(ch)
  }
  println('x'.isASCIILetter)
  println('0'.isASCIILetter)
}

Once you're using an explicit function with an understandable name, your intent should be clear either way and you can choose the implementation with the better performance (though any performance differences between the two versions above should be rather minimal).

Reimer Behrends
  • 8,600
  • 15
  • 19
0

Second one could be written as:

def letter = elem("ascii letter", c => ('a' to 'z') ++ ('A' to 'Z') contains c)

It is more readable, but less performant.

Or, if you're terrified with ++, as barely plain english:

c => ('a' to 'z') union ('A' to 'Z') contains c
om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
-1

Another - well - elegant solution could be using min/max:

c => 'A'.max(c.toUpper) == 'Z'.min(c.toUpper)

or

c => 'A'.max(c) == 'Z'.min(c) || 'a'.max(c) == 'z'.min(c)
michael_s
  • 2,515
  • 18
  • 24
  • 1
    No offence, but in my opinion it hides an original intent and thus clutters the code (while this might be concise and smarty). Actually, if we would combine our solutions we might get something like `'a' to 'z' contains c.toLower` which I *personally* like a lot more. – om-nom-nom Mar 15 '13 at 20:58
  • 2
    yeah - that looks really smart - but, it's kind of inefficient, isn't it? ;) – michael_s Mar 15 '13 at 21:21
  • yep, it is will be **highly inefficient** in tight loops – om-nom-nom Mar 15 '13 at 21:23
  • OK - So you could add that to your solution as well :) - I was thinking about some subtraction, but that's getting too weird now. – michael_s Mar 15 '13 at 21:36