Sure:
public static String[] ngrams(String str, int length) {
char[] chars = str.toCharArray();
final int resultCount = chars.length - length + 1;
String[] result = new String[resultCount];
for (int i = 0; i < resultCount; i++) {
result[i] = new String(chars, i, length);
}
return result;
}
The changes I made:
- instead of splitting via a regexp, I used
String#toCharArray()
which does a single array copy and is therefore much faster
- instead of rebuilding the resulting Strings from a
StringBuilder
, I used an appropriate String
constructor which, again, does only a single arraycopy
- (not needed for performance, but still) I changed the method signature to have
length
as a parameter for my testing causes. Feel free to change it back - just make sure you rename the method from ngrams()
to ngrams12()
or something.
Or drop it everything altogether and use a naïve approach with String#substring() that does a similar work under the hood:
public static String[] ngramsSubstring(String str, int length) {
final int resultCount = str.length() - length + 1;
String[] result = new String[resultCount];
for (int i = 0; i < resultCount; i++) {
result[i] = str.substring(i, i+length);
}
return result;
}
By the way, if you ever had to use a regexp in the future, try compiling it once and reusing it instead of compiling it every time the method gets used. For example, your code would look like:
private static final Pattern EVERY_CHAR = Pattern.compile("(?!^)");
and then, in the method, instead of String#split
, you'd use
String[] parts = EVERY_CHAR.split(str);