string1.split("(?=-)");
This works because split
actually takes a regular expression. What you're actually seeing is a "zero-width positive lookahead".
I would love to explain more but my daughter wants to play tea party. :)
Edit: Back!
To explain this, I will first show you a different split
operation:
"Ram-sita-laxman".split("");
This splits your string on every zero-length string. There is a zero-length string between every character. Therefore, the result is:
["", "R", "a", "m", "-", "s", "i", "t", "a", "-", "l", "a", "x", "m", "a", "n"]
Now, I modify my regular expression (""
) to only match zero-length strings if they are followed by a dash.
"Ram-sita-laxman".split("(?=-)");
["Ram", "-sita", "-laxman"]
In that example, the ?=
means "lookahead". More specifically, it mean "positive lookahead". Why the "positive"? Because you can also have negative lookahead (?!
) which will split on every zero-length string that is not followed by a dash:
"Ram-sita-laxman".split("(?!-)");
["", "R", "a", "m-", "s", "i", "t", "a-", "l", "a", "x", "m", "a", "n"]
You can also have positive lookbehind (?<=
) which will split on every zero-length string that is preceded by a dash:
"Ram-sita-laxman".split("(?<=-)");
["Ram-", "sita-", "laxman"]
Finally, you can also have negative lookbehind (?<!
) which will split on every zero-length string that is not preceded by a dash:
"Ram-sita-laxman".split("(?<!-)");
["", "R", "a", "m", "-s", "i", "t", "a", "-l", "a", "x", "m", "a", "n"]
These four expressions are collectively known as the lookaround expressions.
Bonus: Putting them together
I just wanted to show an example I encountered recently that combines two of the lookaround expressions. Suppose you wish to split a CapitalCase identifier up into its tokens:
"MyAwesomeClass" => ["My", "Awesome", "Class"]
You can accomplish this using this regular expression:
"MyAwesomeClass".split("(?<=[a-z])(?=[A-Z])");
This splits on every zero-length string that is preceded by a lower case letter ((?<=[a-z])
) and followed by an upper case letter ((?=[A-Z])
).
This technique also works with camelCase identifiers.