You can match on 1 or more contiguous alpha characters or 1 or more contiguous numerical characters. Once the sequence is interrupted stop matching, store the sequence an then start over. Non-word characters will be ignored entirely.
Edit: I created a simple performance test below to show the speed between using String.split()
and Pattern.matcher()
. The split version is 2.5x faster than the matcher+loop version.
Solution
private static String[] splitAlphaNumeric(String str) {
return str.split("(?i)((?<=[A-Z])(?=\\d))|((?<=\\d)(?=[A-Z]))");
}
Example
import java.util.*;
import java.util.regex.*;
public class SplitAlphaNumeric {
private static final Pattern ALPH_NUM_PAT = Pattern.compile("[0-9]+|[A-Z]+");
private static List<String> input = Arrays.asList(
"RC23",
"CC23QQ21HD32",
"BPOASDf91A5HH123"
);
public static void main(String[] args) {
System.out.printf("Execution time: %dns%n", testMatch());
System.out.printf("Execution time: %dns%n", testSplit());
}
public static long testMatch() {
System.out.println("Begin Test 1...");
long start = System.nanoTime();
for (String str : input) {
System.out.printf("%-16s -> %s%n", str, parse(str));
}
long end = System.nanoTime();
return end - start;
}
public static long testSplit() {
System.out.println("\nBegin Test 2...");
long start = System.nanoTime();
for (String str : input) {
System.out.printf("%-16s -> %s%n", str, parse2(str));
}
long end = System.nanoTime();
return end - start;
}
private static List<String> parse(String str) {
List<String> parts = new LinkedList<String>();
Matcher matcher = ALPH_NUM_PAT.matcher(str);
while (matcher.find()) {
parts.add(matcher.group());
}
return parts;
}
private static List<String> parse2(String str) {
return Arrays.asList(str.split("(?i)((?<=[A-Z])(?=\\d))|((?<=\\d)(?=[A-Z]))"));
}
}
Output
Begin Test 1...
RC23 -> [RC, 23]
CC23QQ21HD32 -> [CC, 23, QQ, 21, HD, 32]
BPOASDf91A5HH123 -> [BPOASD, 91, A, 5, HH, 123]
Execution time: 4879125ns
Begin Test 2...
RC23 -> [RC, 23]
CC23QQ21HD32 -> [CC, 23, QQ, 21, HD, 32]
BPOASDf91A5HH123 -> [BPOASDf, 91, A, 5, HH, 123]
Execution time: 1953349ns