I need to split a camelCased text containing only lowerCase and UpperCase letters. How to do it using regular expression?
Example text: ThisTextIsToBeSplitted
Output: This Text Is To Be Splitted
I need to split a camelCased text containing only lowerCase and UpperCase letters. How to do it using regular expression?
Example text: ThisTextIsToBeSplitted
Output: This Text Is To Be Splitted
I would offer the following solution, which preserves acronyms (e.g. ABC
), which the other answers do not:
String input = "ThisTextWithInitialABCIsToBeSplitted";
String[] parts = input.split("((?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z]))");
for (String part : parts) {
System.out.println(part);
}
Output:
This
Text
With
Initial
ABC
Is
To
Be
Splitted
The logic of the split is to use lookarounds which assert, but do not consume. A split happens on either of the following two conditions:
(?<=[a-z])(?=[A-Z])
(?<=[A-Z])(?=[A-Z][a-z])
The first condition is when we are at a position immediately preceded by a lowercase letter and immediately proceeded by a capital letter. But with this rule alone, the string InitialABCIs
would split to this:
Intitial
ABCI
s
To fix this, I added a second condition which splits when the preceding letter be capital, followed by one more capital and a lowercase. This allows us to separate the true start of the next camelcase word.
Demo here:
String s = "ThisTextIsToBeSplitted";
System.out.println(Arrays.asList(s.split("(?=[A-Z])")));
works fine. My output is:
[This, Text, Is, To, Be, Splitted]
for the example of Stephen, the output is [This, T, E, X, T, Is, To, Be, Splitted]
because it spits at every upper case letter