4

I'm working on a project where I need to check if a string is in the correct format ABC1234 meaning 3 letters followed by 4 numbers. I was told not to use regular expressions to solve this.

I came up with the following code but it's clunky so I'm looking for something cleaner and more efficient.

String sample = ABC1234

char[] chars = sample.toCharArray();

if(Character.isLetter(chars[0]) && Character.isLetter(chars[1]) && 
   Character.isLetter(chars[2]) && Character.isDigit(chars[3]) && 
   Character.isDigit(chars[4]) && Character.isDigit(chars[5]) && 
   Character.isDigit(chars[6])){

    list.add(sample);
}

// OUTPUT: ABC1234 gets added to "list". When it prints, it appears as ABC1234.

All outputs are as expected but I know this can be done either more efficiently or just better in general.

I'm just checking the first 3 chars to verify they're each a letter and the last 4 chars should be numbers.

Any advice? Thanks in advance.

azro
  • 53,056
  • 7
  • 34
  • 70
  • Might want to check the length first, but otherwise it's fairly fast. Not very extendible / reusable, but that wasn't a requirement. Otherwise, if you want to go regex style without using regex, look into implementing a finite state automata. – Evan Knowles Dec 20 '18 at 21:46
  • 1
    `I know this can be done either more efficiently`, how do you know this ? – azro Dec 20 '18 at 21:48
  • I assume your question is based on a school exercise. Checking an easy format like in your example without using a regex is more complex than using a regex. A solution with less code means less possible bugs can happen. The corresponding regex is `\w{3}\d{4}`, your code reduces to one line `sample.matches("\\w{3}\\d{4}")`. – fireandfuel Dec 20 '18 at 23:08

7 Answers7

3

You do not need

char[] chars = sample.toCharArray();

Instead you can just do

if(Character.isLetter(sample.charAt(0))

You can also be more fancy and do something like :

void myFonc(string sample) {
 for (int i =0; i < 3; ++i)
        if (!Character.isLetter(sample.charAt(i)))
            return;

 for (int i =3; i < 7; ++i)
        if (!Character.isDigit(sample.charAt(i)))
            return;
list.add(sample);

}
YouneS
  • 390
  • 4
  • 10
2

Here is another way.

String sample = "ABC1234";
if (sample.substring(0, 3).chars().allMatch(Character::isLetter)
      && sample.substring(3).chars().allMatch(Character::isDigit)) {
  list.add(sample);
}
Pradip Karki
  • 662
  • 1
  • 8
  • 21
2

Since the question contained "All outputs are as expected but I know this can be done either more efficiently or just better in general." (and since I like performance) I wrote some benchmarks comparing each answer to give a conclusion about efficiency (looking at throughput).

The entire benchmark code can be found at the bottom of the question, if you notice anything that's wrong I'd be happy to correct it (even if it's not be perfect there are good indications of performance of each answer).

The tests were ran on a DigitalOcean droplet, 2GB ram, 2 vCores (Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz) with OpenJDK8 installed, the JMH version is 1.21.

Each answer was tested with 3 Strings, "ABC1234" to reflect the example in the question, "ABC123D" which should fail, and "ABC123" which is too short (not sure if it's relevant to OP). The tests are configured for 5 forks, 5 warmup iterations of 1 second, 20 measurement iterations of 1 second.

Results

Benchmark                            (sample)   Mode  Cnt          Score         Error  Units
MyBenchmark.aomine                    ABC1234  thrpt  100    5102477.405 ±   92474.543  ops/s
MyBenchmark.aomine                    ABC123D  thrpt  100    5325954.315 ±  118367.303  ops/s
MyBenchmark.aomine                      AB123  thrpt  100  228544750.370 ± 2972826.551  ops/s
MyBenchmark.azro                      ABC1234  thrpt  100   38550638.399 ±  582816.997  ops/s
MyBenchmark.azro                      ABC123D  thrpt  100   38159991.786 ±  791457.371  ops/s
MyBenchmark.azro                        AB123  thrpt  100   76372552.584 ± 1131365.381  ops/s
MyBenchmark.baselineCarlosDeLaTorre   ABC1234  thrpt  100   37584463.448 ±  444739.798  ops/s
MyBenchmark.baselineCarlosDeLaTorre   ABC123D  thrpt  100   38461464.626 ±  461497.068  ops/s
MyBenchmark.baselineCarlosDeLaTorre     AB123  thrpt  100   52743609.713 ±  590609.005  ops/s
MyBenchmark.elliotFrisch              ABC1234  thrpt  100   16531274.955 ±  313705.782  ops/s
MyBenchmark.elliotFrisch              ABC123D  thrpt  100   16861377.659 ±  361382.816  ops/s
MyBenchmark.elliotFrisch                AB123  thrpt  100  227980231.801 ± 3071776.693  ops/s
MyBenchmark.elliotFrischOptimized     ABC1234  thrpt  100   37031168.714 ±  749067.222  ops/s
MyBenchmark.elliotFrischOptimized     ABC123D  thrpt  100   33383546.778 ±  799217.656  ops/s
MyBenchmark.elliotFrischOptimized       AB123  thrpt  100  214954411.915 ± 5283511.503  ops/s
MyBenchmark.elliotFrischRegex         ABC1234  thrpt  100    6862779.467 ±  122048.790  ops/s
MyBenchmark.elliotFrischRegex         ABC123D  thrpt  100    6830229.583 ±  119561.120  ops/s
MyBenchmark.elliotFrischRegex           AB123  thrpt  100   10797021.026 ±  558964.833  ops/s
MyBenchmark.mark                      ABC1234  thrpt  100   38451993.441 ±  478379.375  ops/s
MyBenchmark.mark                      ABC123D  thrpt  100   37667656.659 ±  680548.809  ops/s
MyBenchmark.mark                        AB123  thrpt  100  228656962.146 ± 2858730.169  ops/s
MyBenchmark.mrB                       ABC1234  thrpt  100   15490382.831 ±  233777.324  ops/s
MyBenchmark.mrB                       ABC123D  thrpt  100     575122.575 ±   10201.967  ops/s
MyBenchmark.mrB                         AB123  thrpt  100  231175971.072 ± 2074819.634  ops/s
MyBenchmark.pradipforever             ABC1234  thrpt  100    5105663.672 ±  171843.786  ops/s
MyBenchmark.pradipforever             ABC123D  thrpt  100    5305419.983 ±   80514.769  ops/s
MyBenchmark.pradipforever               AB123  thrpt  100   12211850.301 ±  217850.395  ops/s

Charts

There are 2 different charts, since the throughput in the chart of ABC123 is so much larger (because some methods return false after comparing String length) it makes it unreadable if added to the rest, where throughput is smaller.

The numbers in the chart mean throughput (executions) per second. All data

Some notes & improvements

mrB

Since this wasn't a complete answer (only for checking the int part) I've used the character validation method of @elliotFrisch. When the String is ABC1234 ofcourse it's fast, however when trying ABC123D and catching the NumberFormatException you can see the performance is bad.

elliotFrisch

After taking a look why the performance wasn't as fast as some others, while very readable I came to the conclusion it's because of calling s.toCharArray() once for validating characters and once for validating numbers.

I've made an improvement on this so that it's only being called once, which can be seen in the results under elliotFrischOptimized.

azro

Good solution, however performance for ABC123 was lower than others due to calling char[] c = s.toCharArray() and then validating c.length instead of s.length() directly. An improvement where this check is implemented can be seen in the results as mark.

Tl;dr & conclusion

The original code was already fast, implementing a length check makes this faster, as seen in azro's answer. To make this length check even faster (prevent 1 call of s.toCharArray()) use the mark code.

If you want a more readable/versatile solution which can be reused I'd go for the elliotFrischOptimized method, which is (almost) just as fast.

If you don't care that much about performance (it'll still check almost 7 million Strings/second as seen in the results), using the regex as provided by @elliotFrisch will do the job, it's very readable and it's maintainable.

Code

@Fork(5)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 20, time = 1)
@State(Scope.Thread)
public class MyBenchmark {

    @Param({ "ABC1234", "ABC123D", "AB123" })
    String sample;

    Pattern p;

    int goodLength;

    @Setup
    public void setup() {
        this.p = Pattern.compile("\\D{3}\\d{4}");
        this.goodLength = 7;
    }

    @Benchmark
    public boolean baselineCarlosDeLaTorre() {
        char[] chars = this.sample.toCharArray();

        if (Character.isLetter(chars[0]) && Character.isLetter(chars[1]) &&
                Character.isLetter(chars[2]) && Character.isDigit(chars[3]) &&
                Character.isDigit(chars[4]) && Character.isDigit(chars[5]) &&
                Character.isDigit(chars[6])) {
            return true;
        }
        return false;
    }

    @Benchmark
    public boolean mark() {
        if (this.sample.length() != this.goodLength) {
            return false;
        }

        char[] chars = this.sample.toCharArray();

        return Character.isLetter(chars[0]) && Character.isLetter(chars[1]) &&
                Character.isLetter(chars[2]) && Character.isDigit(chars[3]) &&
                Character.isDigit(chars[4]) && Character.isDigit(chars[5]) &&
                Character.isDigit(chars[6]);
    }

    @Benchmark
    public boolean azro() {
        char[] chars = this.sample.toCharArray();

        if (chars.length == this.goodLength && Character.isLetter(chars[0]) &&
                Character.isLetter(chars[1]) && Character.isLetter(chars[2]) &&
                Character.isDigit(chars[3]) && Character.isDigit(chars[4]) &&
                Character.isDigit(chars[5]) && Character.isDigit(chars[6])) {
            return true;
        }
        return false;
    }

    public boolean elliotFrischAllLLettersOptimized(char[] chars, int from, int to) {
        for (int i = from; i < to; i++) {
            if (!Character.isLetter(chars[i])) {
                return false;
            }
        }
        return true;
    }

    public boolean elliotFrischAllDigitsOptimized(char[] chars, int from, int to) {
        for (int i = from; i < to; i++) {
            if (!Character.isDigit(chars[i])) {
                return false;
            }
        }
        return true;
    }

    @Benchmark
    public boolean elliotFrischOptimized() {
        if (this.sample.length() != this.goodLength) {
            return false;
        }

        char[] chars = this.sample.toCharArray();

        return elliotFrischAllLLettersOptimized(chars, 0, 3)
                && elliotFrischAllDigitsOptimized(chars, 3, 7);
    }

    public boolean elliotFrischAllLLetters(String s) {
        for (char ch : s.toCharArray()) {
            if (!Character.isLetter(ch)) {
                return false;
            }
        }
        return true;
    }

    public boolean elliotFrischAllDigits(String s) {
        for (char ch : s.toCharArray()) {
            if (!Character.isDigit(ch)) {
                return false;
            }
        }
        return true;
    }

    @Benchmark
    public boolean elliotFrisch() {
        return this.sample.length() == this.goodLength
                && elliotFrischAllLLetters(this.sample.substring(0, 3))
                && elliotFrischAllDigits(this.sample.substring(3));
    }

    @Benchmark
    public boolean elliotFrischRegex() {
        return this.p.matcher(this.sample).matches();
    }

    @Benchmark
    public boolean aomine() {
        return this.sample.length() == this.goodLength &&
                this.sample.substring(0, 3).codePoints()
                        .allMatch(Character::isLetter)
                && this.sample.substring(3, 7).codePoints()
                        .allMatch(Character::isDigit);
    }

    @Benchmark
    public boolean pradipforever() {
        if (this.sample.substring(0, 3).chars().allMatch(Character::isLetter)
                && this.sample.substring(3).chars().allMatch(Character::isDigit)) {
            return true;
        }
        return false;
    }

    public boolean mrBParseInt(String s) {
        try {
            Integer.parseInt(s);
            return true;
        } catch (NumberFormatException ex) {
            return false;
        }
    }

    @Benchmark
    public boolean mrB() {
        return this.sample.length() == this.goodLength
                && elliotFrischAllLLetters(this.sample.substring(0, 3))
                && mrBParseInt(this.sample.substring(3));
    }

}
Mark
  • 5,089
  • 2
  • 20
  • 31
1

I would write two utility methods; one to check if a given String is all letters and another to check if a given String is all digits. Then invoke the two methods by using String.substring(int, int) to compare the relevant substring(s). Like,

private static boolean allLetters(String s) {
    for (char ch : s.toCharArray()) {
        if (!Character.isLetter(ch)) {
            return false;
        }
    }
    return true;
}

private static boolean allDigits(String s) {
    for (char ch : s.toCharArray()) {
        if (!Character.isDigit(ch)) {
            return false;
        }
    }
    return true;
}

public static void main(String[] args) {
    // ...
    String s = "ABC1234";
    if (s.length() == 7 && allLetters(s.substring(0, 3)) && allDigits(s.substring(3))) {
        list.add(s);
    }
}

But, in real code, the regular expression is better still -

Pattern p = Pattern.compile("\\D{3}\\d{4}");
if (p.matcher(s).matches()) {
    // ...
}
Elliott Frisch
  • 198,278
  • 20
  • 158
  • 249
  • 1
    Two minor flaws of the non-regex code: it should be the tail starting at 3, not 4, and it should check that there are exactly 4 digits. – Ralf Kleberhoff Dec 20 '18 at 21:53
0

The only thing you could add is a length-check at first :

if (chars.length == 7 && Character.isLetter(chars[0]) &&
        Character.isLetter(chars[1]) && Character.isLetter(chars[2]) &&
        Character.isDigit(chars[3]) && Character.isDigit(chars[4]) &&
        Character.isDigit(chars[5]) && Character.isDigit(chars[6])) {
    //..
}

Use some loops won't be more efficient as && is already a short-circuit and will stop at the moment it'll find a false boolean

azro
  • 53,056
  • 7
  • 34
  • 70
0

Your current approach is just fine given you cannot use regex, but here is another approach:

 boolean isValid = sample.length() == 7 &&
                sample.substring(0, 3).codePoints()
                        .allMatch(Character::isLetter)
                && sample.substring(3, 7).codePoints()
                .allMatch(Character::isDigit);
Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
-1

You can check the last 4 together using Integer.parseInt(sample.substring(3,7); I can't think of a quicker for the letters though. Integer.parseInt will throw a NumberFormatException if it isn't a number so do it in a try block

MrB
  • 818
  • 8
  • 28
  • You should NOT use exceptions for flow-control in expected situations. – Ralf Kleberhoff Dec 20 '18 at 21:56
  • @RalfKleberhoff I agree, but that's a problem with the j2se API more than with this suggestion – that other guy Dec 20 '18 at 22:00
  • Yeah, lacking some sort of `TryParse` equivalent, I don't really see an issue with this particular case. Have a look at https://stackoverflow.com/questions/1486077/good-way-to-encapsulate-integer-parseint – MrB Dec 20 '18 at 22:02
  • 1
    The problems with this approach is that it'll accept things like `-000`, `+123` and `൫᧙꘠٤` – that other guy Dec 20 '18 at 22:05