30

What is the fastest way to check that a String contains only alphanumeric characters.

I've got some code that is going to chew up a lot of CPU and I wonder if there is going to be a quicker way than using pre-compiled regular expressions.

E_net4
  • 27,810
  • 13
  • 101
  • 139
Jay
  • 19,649
  • 38
  • 121
  • 184

3 Answers3

47

Use String.matches(), like:

String myString = "qwerty123456";
System.out.println(myString.matches("[A-Za-z0-9]+"));

That may not be the absolute "fastest" possible approach. But in general there's not much point in trying to compete with the people who write the language's "standard library" in terms of performance.

aroth
  • 54,026
  • 20
  • 135
  • 176
  • Yea. C# has a standard function for this. It appears Java does not. – Jay Oct 11 '12 at 03:53
  • 2
    But there is a point. When you are executing code in a loop millions of times, and you can make it run 5 times faster, it can make a significant impact to your message processing time, and that impacts how many servers you might need for message processing. – Jay Aug 19 '18 at 22:15
  • 1
    I believe that's overstating it. At best you're making one small aspect of the code 5 times faster. That may be worthwhile, _iff_ your performance is actually being limited by that aspect. It may well not be. For instance, if part of your message processing pipeline involves writing to a database or filesystem or doing any network comms, _that's_ likely your limiting factor. There aren't many real-world cases where speeding up a regex check by 5x will actually speed up the application by 5x (or anywhere close to 5x). And the tradeoff is less readability, flexibility, and extensibility. – aroth Aug 20 '18 at 01:44
  • ...it is important to optimize where and when it makes sense, yes. However most of the time for most real-world applications "I made it faster by replacing a built-in, configurable, regex based function with a fixed/static loop" is not a sensible optimization. – aroth Aug 20 '18 at 01:46
  • 1
    Yes but you don’t know the context the code is being used, so you can’t judge. In this particular instance, I used this code to speed up a script that was processing gigabytes of log files, shaving 10 minutes or so off the task. – Jay Aug 21 '18 at 02:52
  • The question didn't ask for advice, it asked for the fastest solution. – Chai T. Rex Jul 24 '22 at 13:56
30

I've written the tests that compare using regular expressions (as per other answers) against not using regular expressions. Tests done on a quad core OSX10.8 machine running Java 1.6

Interestingly using regular expressions turns out to be about 5-10 times slower than manually iterating over a string. Furthermore the isAlphanumeric2() function is marginally faster than isAlphanumeric(). One supports the case where extended Unicode numbers are allowed, and the other is for when only standard ASCII numbers are allowed.

public class QuickTest extends TestCase {

    private final int reps = 1000000;

    public void testRegexp() {
        for(int i = 0; i < reps; i++)
            ("ab4r3rgf"+i).matches("[a-zA-Z0-9]");
    }

public void testIsAlphanumeric() {
    for(int i = 0; i < reps; i++)
        isAlphanumeric("ab4r3rgf"+i);
}

public void testIsAlphanumeric2() {
    for(int i = 0; i < reps; i++)
        isAlphanumeric2("ab4r3rgf"+i);
}

    public boolean isAlphanumeric(String str) {
        for (int i=0; i<str.length(); i++) {
            char c = str.charAt(i);
            if (!Character.isLetterOrDigit(c))
                return false;
        }

        return true;
    }

    public boolean isAlphanumeric2(String str) {
        for (int i=0; i<str.length(); i++) {
            char c = str.charAt(i);
            if (c < 0x30 || (c >= 0x3a && c <= 0x40) || (c > 0x5a && c <= 0x60) || c > 0x7a)
                return false;
        }
        return true;
    }

}
k4dima
  • 6,070
  • 5
  • 41
  • 39
Jay
  • 19,649
  • 38
  • 121
  • 184
  • Maybe it's important to mention that `Character.isLetter` and `Character.isDigit` are also accepting characters which are not in a-z, A-Z, 0-9. For example `isDigit` is accepting all characters from Unicode category `Nd`: https://www.fileformat.info/info/unicode/category/Nd/list.htm – mbo Jun 27 '19 at 11:35
  • Yes, that is the main difference between the two fastest options. Deciding between the two would depend on user requirements. But either way, they are both significantly faster than regular expressions. (Theoretically, this may be different in newer versions of java, but I doubt it.) – Jay Jul 01 '19 at 09:47
3

A regex will probably be quite efficient, because you would specify ranges: [0-9a-zA-Z]. Assuming the implementation code for regexes is efficient, this would simply require an upper and lower bound comparison for each range. Here's basically what a compiled regex should do:

boolean isAlphanumeric(String str) {
    for (int i=0; i<str.length(); i++) {
        char c = str.charAt(i);
        if (c < 0x30 || (c >= 0x3a && c <= 0x40) || (c > 0x5a && c <= 0x60) || c > 0x7a)
            return false;
    }

    return true;
}

I don't see how your code could be more efficient than this, because every character will need to be checked, and the comparisons couldn't really be any simpler.

Ravi
  • 34,851
  • 21
  • 122
  • 183
snibbets
  • 1,095
  • 10
  • 11
  • 1
    Its possibly going to be between this and `String.matches()`. I was kind of hoping someone had done it before. Looks like im going to have to write the test myself (: – Jay Oct 11 '12 at 03:54
  • It all depends on the use. There is no standard way of creating the perfect algorithm for all string match scenarios. Regex may serve to a great extent, but just because it looks complex and nerdy, doesn't make it more efficient. For 100+ iterations per second, I would definitely consider a customized algorithm. – Thomas Williams Oct 22 '18 at 12:40