39

I'd like to split a string using the Split function in the Regex class. The problem is that it removes the delimiters and I'd like to keep them. Preferably as separate elements in the splitee.

According to other discussions that I've found, there are only inconvenient ways to achieve that.

Any suggestions?

Community
  • 1
  • 1
  • 6
    Input string? your regex? expected output? – I4V Mar 27 '13 at 19:41
  • this `.etc` doesn't give much info. about your algorithm but I can try at least. See my answer – I4V Mar 27 '13 at 20:12
  • 5
    @AndreasJohansson: to the contrary, there *was* sample code to be posted. You wrote `the problem is that it removes...` What is "it" in this situation? This is a classic question of "I can get this output, but I'd like to get this output"- a great kind of question, but one made much easier to answer if the original code (that gives close to, but not exactly, the desired output) is shown. – David Robinson Mar 28 '13 at 23:19
  • 2
    This question has triggered a [discussion on Meta](http://meta.stackexchange.com/questions/174057/group-bashing-is-it-allowed). – Michael Petrotta Mar 28 '13 at 23:23
  • @DavidRobinson At the risk of starting a flame war which isn't my intention, *it* refers to the previous sentences subject - the *Split* function. "It" (i.e. the function) does split but looses the delimiters. I've checked with several people and I got confirmed that it's correct English and fully understandable question. I'm about to remove it, anyway, and simply repost it. Thanks for the input. –  Mar 29 '13 at 00:12
  • 3
    @AndreasJohansson - Don't repost. [edit]. If there's a problem with your post, reposting it may lead to an automatic question ban. Instead, I think the people here are just simply asking you to post an example of the code that doesn't work, so that it can help them tailor a solution for you that builds on what you already know instead of guessing what you know and then have you come back with a comment saying "No, that's not what I meant.". Remember, people here are volunteering their time to help you, so it's wise to help them by posting what they ask for. Hope this helps! :) – jamesmortensen Mar 29 '13 at 00:30
  • 1
    @jmort253 I really tried to reformulate the question but I could find any way to do that without actually damaging the question that **I** was asking. I'm really sorry. I'm going to disregard this question in whole because it's caused way to much attention. Please don't take that as I'm ignoring **you**. I'm just cutting off the infected thread. –  Mar 29 '13 at 01:02

6 Answers6

93

Just put the pattern into a capture-group, and the matches will also be included in the result.

string[] result = Regex.Split("123.456.789", @"(\.)");

Result:

{ "123", ".", "456", ".", "789" }

This also works for many other languages:

  • JavaScript: "123.456.789".split(/(\.)/g)
  • Python: re.split(r"(\.)", "123.456.789")
  • Perl: split(/(\.)/g, "123.456.789")

(Not Java though)

Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138
  • Oh, this was even better! Funny example - you match *any* by a period that **actually** is a period. +1 for a great syntax! However, for some reason it doesn't catch the last element so I get just what you said but **except** for the *789* part. –  Mar 27 '13 at 20:23
  • While reading look ahead, I read that it's not included in result like: Regex.Match ("say 25 miles more", @"\d+\s(?=miles)"); //OUTPUT: 25 and another statement states that to include the separator while splitting wrap the pattern in positive look ahead like: Regex.Split ("oneTwoThree", @"(?=[A-Z])"); // OUTPUT one Two Three confused –  Dec 24 '16 at 19:45
  • 1
    @sortednoun The look-ahead matches zero characters, only if the body would match from that position. The look-ahead body is not part of the match, so there is nothing extra to include. The text matched by the body would instead be included in the next array item, when splitting. `(?=([A-Z]))` would both create an extra item with the letter AND include it in the next item. – Markus Jarderot Dec 24 '16 at 20:40
7

Use Matches to find the separators in the string, then get the values and the separators.

Example:

string input = "asdf,asdf;asdf.asdf,asdf,asdf";

var values = new List<string>();
int pos = 0;
foreach (Match m in Regex.Matches(input, "[,.;]")) {
  values.Add(input.Substring(pos, m.Index - pos));
  values.Add(m.Value);
  pos = m.Index + m.Length;
}
values.Add(input.Substring(pos));
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
4

Say that input is "abc1defg2hi3jkl" and regex is to pick out digits.

String input = "abc1defg2hi3jkl";
var parts = Regex.Matches(input, @"\d+|\D+")
            .Cast<Match>()
            .Select(m => m.Value)
            .ToList();

Parts would be: abc 1 defg 2 hi 3 jkl

I4V
  • 34,891
  • 6
  • 67
  • 79
1

For Java:

Arrays.stream("123.456.789".split("(?<=\\.)|(?=\\.)+"))
                .forEach((p) -> {
                    System.out.println(p);
                });

outputs:

123
.
456
.
789

inspired from this post (How to split string but keep delimiters in java?)

box110a
  • 23
  • 4
0

Add them back:

    string[] Parts = "A,B,C,D,E".Split(',');
    string[] Parts2 = new string[Parts.Length * 2 - 1];
    for (int i = 0; i < Parts.Length; i++)
    {
        Parts2[i * 2] = Parts[i];
        if (i < Parts.Length - 1)
            Parts2[i * 2 + 1] = ",";
    }
Michael Ross
  • 572
  • 4
  • 7
  • 1
    But that doesn't work in the case that the regex has more than one possible match. – AJMansfield Mar 27 '13 at 19:44
  • 1
    What do you do if you don't know what delimiter's been used? Can you repeat the example to us *Regex* class? –  Mar 27 '13 at 19:45
0

for c#: Split paragraph to sentance keeping the delimiters sentance is splited by . or ? or ! followed by one space (otherwise if there any mail id in sentance it will be splitted)

string data="first. second! third? ";
Regex delimiter = new Regex("(?<=[.?!] )"); //there is a space between ] and )
string[] afterRegex=delimiter.Split(data);

Result

first. second! third?

H_MONK
  • 1