89

I'm facing problem in splitting String.

I want to split a String with some separator but without losing that separator.

When we use somestring.split(String separator) method in Java it splits the String but removes the separator part from String. I don't want this to happen.

I want result like below:

String string1="Ram-sita-laxman";
String seperator="-";
string1.split(seperator);

Output:

[Ram, sita, laxman]

but I want the result like the one below instead:

[Ram, -sita, -laxman]

Is there a way to get output like this?

M. A. Kishawy
  • 5,001
  • 11
  • 47
  • 72
sag
  • 891
  • 1
  • 7
  • 3

5 Answers5

271
string1.split("(?=-)");

This works because split actually takes a regular expression. What you're actually seeing is a "zero-width positive lookahead".

I would love to explain more but my daughter wants to play tea party. :)

Edit: Back!

To explain this, I will first show you a different split operation:

"Ram-sita-laxman".split("");

This splits your string on every zero-length string. There is a zero-length string between every character. Therefore, the result is:

["", "R", "a", "m", "-", "s", "i", "t", "a", "-", "l", "a", "x", "m", "a", "n"]

Now, I modify my regular expression ("") to only match zero-length strings if they are followed by a dash.

"Ram-sita-laxman".split("(?=-)");
["Ram", "-sita", "-laxman"]

In that example, the ?= means "lookahead". More specifically, it mean "positive lookahead". Why the "positive"? Because you can also have negative lookahead (?!) which will split on every zero-length string that is not followed by a dash:

"Ram-sita-laxman".split("(?!-)");
["", "R", "a", "m-", "s", "i", "t", "a-", "l", "a", "x", "m", "a", "n"]

You can also have positive lookbehind (?<=) which will split on every zero-length string that is preceded by a dash:

"Ram-sita-laxman".split("(?<=-)");
["Ram-", "sita-", "laxman"]

Finally, you can also have negative lookbehind (?<!) which will split on every zero-length string that is not preceded by a dash:

"Ram-sita-laxman".split("(?<!-)");
["", "R", "a", "m", "-s", "i", "t", "a", "-l", "a", "x", "m", "a", "n"]

These four expressions are collectively known as the lookaround expressions.

Bonus: Putting them together

I just wanted to show an example I encountered recently that combines two of the lookaround expressions. Suppose you wish to split a CapitalCase identifier up into its tokens:

"MyAwesomeClass" => ["My", "Awesome", "Class"]

You can accomplish this using this regular expression:

"MyAwesomeClass".split("(?<=[a-z])(?=[A-Z])");

This splits on every zero-length string that is preceded by a lower case letter ((?<=[a-z])) and followed by an upper case letter ((?=[A-Z])).

This technique also works with camelCase identifiers.

Community
  • 1
  • 1
Adam Paynter
  • 46,244
  • 33
  • 149
  • 164
  • 2
    Can you post an example of how you would do a lookaround that splits both before and after the hypens? ie. to produce `ram,-,sita,-,laxman` – dwjohnston Jan 16 '14 at 04:14
  • How do I match to non-zero length strings? Like this "([non zero]?<=[a-z])([non zero]?=[A-Z])"? – the_prole Jun 21 '14 at 18:07
  • @the_prole: Could you give some examples of what you mean? – Adam Paynter Jun 22 '14 at 11:44
  • 2
    Which browsers support this? `"Ram-sita-laxman".split("(?<=-)");` results in one string `["Ram-sita-laxman"]`. – AturSams May 21 '17 at 16:00
  • In hava this solution have a huge impact on performance, for example using negative lookbehind is 10 times slower then simply matching the "-" character – raythurnevoid May 31 '17 at 22:47
  • @wolfdawn this is for Java, Javascript distinguishes between strings and regex sequences, using forward slashes instead of quotes. `"Ram-sita-laxman".split(/(?<=-)/)` should work in Javascript – theferrit32 Jun 04 '19 at 17:54
6

It's a bit dodgy, but you could introduce a dummy separator using a replace function. I don't know the Java methods, but in C# it could be something like:

string1.Replace("-", "#-").Split("#");

Of course, you'd need to pick a dummy separator that's guaranteed not to be anywhere else in the string.

Andrew Cooper
  • 32,176
  • 5
  • 81
  • 116
3

Adam hit the nail on the head! I used his answer to figure out how to insert filename text from the file dialog browser into a rich text box. The problem I ran into was when I was adding a new line at the "\" in the file string. The string.split command was splitting at the \ and deleting it. After using a mixture of Adam's code I was able to create a new line after each \ in the file name.

Here is the code I used:

OpenFileDialog fd = new OpenFileDialog();
        fd.Multiselect = true;
        fd.ShowDialog();

        foreach (string filename in fd.FileNames)
        {
            string currentfiles = uxFiles.Text;
            string value = "\r\n" + filename;

     //This line allows the Regex command to split after each \ in the filename. 

            string[] lines = Regex.Split(value, @"(?<=\\)");

            foreach (string line in lines)
            {
                uxFiles.Text = uxFiles.Text + line + "\r\n";
            }
        }

Enjoy!

Walrusking

Walrusking
  • 43
  • 5
2

A way to do this is to split your string, then add your separator at the beginning of each extracted string except the first one.

Dalmas
  • 26,409
  • 9
  • 67
  • 80
1
seperator="-";
String[] splitstrings = string1.split(seperator);
for(int i=1; i<splitstring.length;i++)
{
   splitstring[i] = seperator + splitstring[i];
}

that is the code fitting to LadaRaider's answer.

mad
  • 3,493
  • 4
  • 23
  • 31