6

I need to test whether character is a letter or a space before moving on further with processing. So, i

    for (Character c : take.toCharArray()) {
        if (!(Character.isLetter(c) || Character.isSpaceChar(c)))
            continue;

        data.append(c);

Once i examined the data, i saw that it contains characters which look like a unicode representation of characters from outside of Latin alphabet. How can i modify the above code to tighten my conditions to only accept letter characters which fall in range of [a-z][A-Z]?

Is Regex a way to go, or there is a better (faster) way?

James Raitsev
  • 92,517
  • 154
  • 335
  • 470
  • 1
    Wait, why do you consider "é" to not be a letter? Usually people are looking for ways to make their code handle international input *better*, not *worse*... – Borealid Feb 06 '12 at 02:11
  • @Borealid, In my case the control character is an oddity, which i am currently further investigating. `é` certainly is a valid character, which for the purposes of my program should not be there. – James Raitsev Feb 06 '12 at 02:13
  • 1
    The regex to do this is to check against the Latin script property with `\p{sc=Latin}`. – tchrist Feb 06 '12 at 02:51
  • Related: [*Identify if a Unicode code point represents a character from a certain script such as the Latin script?*](https://stackoverflow.com/q/62109781/642706) – Basil Bourque May 31 '20 at 04:53

3 Answers3

18

If you specifically want to handle only those 52 characters, then just handle them:

public static boolean isLatinLetter(char c) {
    return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
}
Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
Ernest Friedman-Hill
  • 80,601
  • 10
  • 150
  • 186
4

If you just want to strip out non-ASCII letter characters, then a quick approach is to use String.replaceAll() and Regex:

s.replaceAll("[^a-zA-Z]", "")

Can't say anything about performance vs. a character by character scan and append to StringBuilder, though.

Alistair A. Israel
  • 6,417
  • 1
  • 31
  • 40
1

I'd use the regular expression you specified for this. It's easy to read and should be quite speedy (especially if you allocate it statically).

Samuel Edwin Ward
  • 6,526
  • 3
  • 34
  • 62
  • Could you provide an example to do it the right way? I'd like to see what's faster. – James Raitsev Feb 06 '12 at 02:27
  • It's getting rather late in the day in this locality, so I'm afraid you'll have to wait for code, particularly if you want it to compile :) – Samuel Edwin Ward Feb 06 '12 at 02:50
  • But, as an aside, you might be overly concerned with speed at this time. Surely this isn't the slowest operation you're performing? It might be more efficient to optimize the time that a future developer (who might be you!) spends trying to understand this bit of code. – Samuel Edwin Ward Feb 06 '12 at 02:52