There is an attempt to make uppercase a next character after the space (which should be done only if this character is a letter and if this charactrer is available). Similarly the first character of the sentence is upper cased without checking if this first letter is a letter.
It may be better to use a boolean flag which should be reset upon applying an upper case to a letter. Also, StringBuilder
should be used instead of concatenating a String in the loop.
So the improved code may look as follows with more rules added:
- make the first letter in the word consisting of letters and/or digits upper case
- words are separated with any non-letter/non-digit character except
'
used in contractions like I'm, there's
etc.
- check for
null
/ empty input
public static String transform(String s) {
if (null == s || s.isEmpty()) {
return s;
}
boolean useUpper = true; // boolean flag
StringBuilder sb = new StringBuilder(s.length());
for (char c : s.toCharArray()) {
if (Character.isLetter(c) || Character.isDigit(c)) {
if (useUpper) {
c = Character.toUpperCase(c);
useUpper = false;
}
} else if (c != '\'') { // any non-alphanumeric character, not only space
useUpper = true; // set flag for the next letter
}
sb.append(c);
}
return sb.toString();
}
Tests:
String[] tests = {
"hello world",
" hi there,what's up?",
"-a-b-c-d",
"ID's 123abc-567def"
};
for (String t : tests) {
System.out.println(t + " -> " + transform(t));
}
Output:
hello world -> Hello World
hi there, what's up? -> Hi There,What's Up?
-a-b-c-d -> -A-B-C-D
ID's 123abc-567def -> ID's 123abc-567def
Update
A regular expression and Matcher::replaceAll(Function<MatchResult, String> replacer)
available since Java 9 may also help to capitalize the first letters in the words:
// pattern to detect the first letter
private static final Pattern FIRST_LETTER = Pattern.compile("\\b(?<!')(\\p{L})([\\p{L}\\p{N}]*?\\b)");
public static String transformRegex(String s) {
if (null == s || s.isEmpty()) {
return s;
}
return FIRST_LETTER.matcher(s)
.replaceAll((mr) -> mr.group(1).toUpperCase() + mr.group(2));
}
Here:
\b
- word boundary
(?<!')
- negative lookbehind for '
as above, that is, match a letter NOT preceded with '
\p{L}
- the first letter in the word (Unicode)
([\p{L}\p{N}]*\b)
- followed by a possibly empty sequence of letters/digits