54

I would like to split a string with delimiters but keep the delimiters in the result.

How would I do this in C#?

halfer
  • 19,824
  • 17
  • 99
  • 186
olidev
  • 20,058
  • 51
  • 133
  • 197
  • 6
    For "a|b", do you want "a|"+"b" or "a"+"|b" or "a|"+"|b" or something else? In short: what segment does a delimiter belong to? – Hans Kesting Jan 13 '11 at 12:43
  • 1
    Keep the delimiters in the what result? You want the delimiter as part of each string that was split? Your question is pretty vague. – Randy Minder Jan 13 '11 at 12:44
  • Hey, I would like to eliminate a string from a list of characters and the result of strings will also containing the delimiters. As a suggestion from veggerby below is something that I would like to achieve. I will test it first\ – olidev Jan 13 '11 at 13:55

19 Answers19

182

If the split chars were ,, ., and ;, I'd try:

using System.Text.RegularExpressions;
...    
string[] parts = Regex.Split(originalString, @"(?<=[.,;])")

(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.

that_roy
  • 59
  • 9
codybartfast
  • 7,323
  • 3
  • 21
  • 25
  • 25
    This worked great for me - Thank You! I just had to make one little tweak for my purposes, as I wanted to include the delimiter at the beginning of each line (not at the end). Use @"(?=[.,;])" instead. – MikeTeeVee Jun 12 '11 at 09:54
  • 2
    This answer needs to be accepted so it will be easier to access. I'm an experienced SO user and it took me a while to find. – Benjamin Gruenbaum Feb 04 '13 at 01:00
  • Hi @it-depends I like your answer, but what if I want to split it using strings instead characters. For example all the separators you are using but followed by an white space. I've tried this but doesn't work. @"(?<=[. , ; ])" – Roberto Zamora Apr 03 '14 at 17:30
  • 1
    @roberto-zamora I'll update the answer properly when I have time to make it more general.. but you might want to try (?<=[.,;]\s) which should match only where the delimiter character is followed by a space or other white space. – codybartfast Apr 04 '14 at 07:44
  • 6
    If you want to keep the delimiters in their own parts (as opposed to the beginning or end of delimited parts), you can also use `@"([.,;])"`. According to https://msdn.microsoft.com/en-us/library/ze12yx1d(v=vs.110).aspx#Anchor_2, "If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array." – Stjepan Rajko Nov 04 '15 at 17:03
  • You actually need to do what @StjepanRajko suggests instead of what this answer suggests, otherwise you end up with the delimeters being duplicated at the end of the previous element as well as in their own element. – Miral Feb 18 '19 at 04:44
47

If you want the delimiter to be its "own split", you can use Regex.Split e.g.:

string input = "plum-pear";
string pattern = "(-)";

string[] substrings = Regex.Split(input, pattern);    // Split on hyphens
foreach (string match in substrings)
{
   Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
//    'plum'
//    '-'
//    'pear'

So if you are looking for splitting a mathematical formula, you can use the following Regex

@"([*()\^\/]|(?<!E)[\+\-])" 

This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02

So:

Regex.Split("10E-02*x+sin(x)^2", @"([*()\^\/]|(?<!E)[\+\-])")

Yields:

  • 10E-02
  • *
  • x
  • +
  • sin
  • (
  • x
  • )
  • ^
  • 2
veggerby
  • 8,940
  • 2
  • 34
  • 43
  • 2
    Hi. Thanks. This is what I wanted. I will test it first. Thanks. But is there any better way rather than using Regex? – olidev Jan 13 '11 at 13:56
  • Hi, because my patterns is for example: char[] chars = new char[]{'A','B','C'}. Would it be possible use Split from Regex for my char array instead of the string pattern? Thanks in advance – olidev Jan 13 '11 at 14:07
  • hi, what if my pattern contains the 4 operators: +,-,* and / How does it look like? Thanks – olidev Jan 13 '11 at 15:46
  • the pattern is simply a regex, so you can do something like: pattern = "([+\-*/])" – veggerby Jan 13 '11 at 17:46
  • 1
    actually if you are parsing/splitting a mathematical expression you can do something like @"([\*\(\)\^\/]|(?<!E)[\+\-])" which will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02 – veggerby Jan 13 '11 at 17:50
  • 5
    Potential Gotcha: the parentheses he uses in his example are required. Regex.Split("plum-pear", "-") yields only 'plum' and 'pear'. Programming is fun. – Andrew Kvochick Jul 10 '15 at 17:40
31

Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:

public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
    int start = 0, index;

    while ((index = s.IndexOfAny(delims, start)) != -1)
    {
        if(index-start > 0)
            yield return s.Substring(start, index - start);
        yield return s.Substring(index, 1);
        start = index + 1;
    }

    if (start < s.Length)
    {
        yield return s.Substring(start);
    }
}
ahsteele
  • 26,243
  • 28
  • 134
  • 248
esac
  • 24,099
  • 38
  • 122
  • 179
23

Just in case anyone wants this answer aswell...

Instead of string[] parts = Regex.Split(originalString, @"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, @"(?=yourmatch)") where yourmatch is whatever your separator is.

Supposing the original string was

777- cat

777 - dog

777 - mouse

777 - rat

777 - wolf

Regex.Split(originalString, @"(?=777)") would return

777 - cat

777 - dog

and so on

Conrad Clark
  • 4,533
  • 5
  • 45
  • 70
  • How would you specify a list of delimiters? i.e. "777, 666, etc." – Thomas Sep 27 '16 at 18:45
  • 1
    @Thomas if I'm not mistaken you could use the `|` token to specify alternatives. So it would be like: `(?=777|666)` – Conrad Clark Sep 28 '16 at 14:46
  • I wanted delimit by `\r\n` so string `first line\r\nsecond line` will turn into `[0] - first line\r\n [1] - second line`. After using your solution I get `[0] - first line [1] - \r\nsecond line`. After changing your solution `@"(?=\r\n)"` into `@"(?<=\r\n)"` solved it. Thanks a lot :) +1 – Gondil Feb 20 '17 at 13:39
  • Thanks for the suggestion. I wanted to split the string by '[' delimiter.`Regex.Split(originalString, @"(?=\[)")` worked for me. – Rajaraman Subramanian Nov 27 '19 at 08:19
  • I don't see any difference between original and result. – The incredible Jan Sep 21 '22 at 09:21
7

This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.

/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
    var parts = new List<string>();
    if (!string.IsNullOrEmpty(s))
    {
        int iFirst = 0;
        do
        {
            int iLast = s.IndexOfAny(delimiters, iFirst);
            if (iLast >= 0)
            {
                if (iLast > iFirst)
                    parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
                parts.Add(new string(s[iLast], 1));//the delimiter
                iFirst = iLast + 1;
                continue;
            }

            //No delimiters were found, but at least one character remains. Add the rest and stop.
            parts.Add(s.Substring(iFirst, s.Length - iFirst));
            break;

        } while (iFirst < s.Length);
    }

    return parts;
}

Some unit tests:

text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
Ron
  • 1,888
  • 20
  • 25
  • 1
    This is a good solution. How about if my delimiter is a string? Can you please provide an implementation of that as well. – nishantvodoo Jun 30 '17 at 16:01
7

A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.

public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
    var rows = new List<string>() { s };
    foreach (string delim in delims)//delimiter counter
    {
        for (int i = 0; i < rows.Count; i++)//row counter
        {
            int index = rows[i].IndexOf(delim);
            if (index > -1
                && rows[i].Length > index + 1)
            {
                string leftPart = rows[i].Substring(0, index + delim.Length);
                string rightPart = rows[i].Substring(index + delim.Length);
                rows[i] = leftPart;
                rows.Insert(i + 1, rightPart);
            }
        }
    }
    return rows;
}
maxp
  • 24,209
  • 39
  • 123
  • 201
3

This seems to work, but its not been tested much.

public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
    List<string> splitValues = new List<string>();
    int itemStart = 0;
    for (int pos = 0; pos < value.Length; pos++)
    {
        for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
        {
            if (separators[sepIndex] == value[pos])
            {
                // add the section of string before the separator 
                // (unless its empty and we are discarding empty sections)
                if (itemStart != pos || splitOptions == StringSplitOptions.None)
                {
                    splitValues.Add(value.Substring(itemStart, pos - itemStart));
                }
                itemStart = pos + 1;

                // add the separator
                splitValues.Add(separators[sepIndex].ToString());
                break;
            }
        }
    }

    // add anything after the final separator 
    // (unless its empty and we are discarding empty sections)
    if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
    {
        splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
    }

    return splitValues.ToArray();
}
Sprotty
  • 5,676
  • 3
  • 33
  • 52
2

To avoid adding character to new line try this :

 string[] substrings = Regex.Split(input,@"(?<=[-])");
Mohit S
  • 13,723
  • 6
  • 34
  • 69
  • 1
    And to add the separator to the beginning of the next line use this: `(?=[-])` –  Apr 11 '20 at 22:16
2

I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.

0xCAFEBABE
  • 5,576
  • 5
  • 34
  • 59
  • 7
    This only works if you have 1 delimiter. If I want to split on spaces *and* newlines, I won't know which to add. – thomas88wp May 31 '15 at 15:36
  • @thomas88wp Just split twice? First list gets first delimiter to every item except the last and then every split list item (except the last) get's the second. The "flattened tree" is moved into a new list... – The incredible Jan Sep 21 '22 at 09:29
2

Recently I wrote an extension method do to this:

public static class StringExtensions
    {
        public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
        {
            string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);

            for (int i = 0; i < obj.Length; i++)
            {
                string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
                yield return result;
            }
        }
    }
BFree
  • 102,548
  • 21
  • 159
  • 201
1
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
    result[i] += separator;

(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)

(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)

mqp
  • 70,359
  • 14
  • 95
  • 123
1

Iterate through the string character by character (which is what regex does anyway. When you find a splitter, then spin off a substring.

pseudo code

int hold, counter;
List<String> afterSplit;
string toSplit

for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
   if(toSplit[counter] = /*split charaters*/)
   {
      afterSplit.Add(toSplit.Substring(hold, counter));
      hold = counter;
   }
}

That's sort of C# but not really. Obviously, choose the appropriate function names. Also, I think there might be an off-by-1 error in there.

But that will do what you're asking.

DevinB
  • 8,231
  • 9
  • 44
  • 54
1

veggerby's answer modified to

  • have no string items in the list
  • have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $@"({Regex.Escape(delimiter)})")
                 .Where(p => p != string.Empty)
                 .ToList();

// parts = "ab", "33", "ab", "9", "ab"

The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.

Welcor
  • 2,431
  • 21
  • 32
0

I wanted to do a multiline string like this but needed to keep the line breaks so I did this

string x = 
@"line 1 {0}
line 2 {1}
";

foreach(var line in string.Format(x, "one", "two")
    .Split("\n") 
    .Select(x => x.Contains('\r') ? x + '\n' : x)
    .AsEnumerable()
) {
    Console.Write(line);
}

yields

line 1 one
line 2 two
0

I came across same problem but with multiple delimiters. Here's my solution:

    public static string[] SplitLeft(this string @this, char[] delimiters, int count)
    {
        var splits = new List<string>();
        int next = -1;
        while (splits.Count + 1 < count && (next = @this.IndexOfAny(delimiters, next + 1)) >= 0)
        {
            splits.Add(@this.Substring(0, next));
            @this = new string(@this.Skip(next).ToArray());
        }
        splits.Add(@this);
        return splits.ToArray();
    }

Sample with separating CamelCase variable names:

var variableSplit = variableName.SplitLeft(
    Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
Aleksandar Toplek
  • 2,792
  • 29
  • 44
0
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace ConsoleApplication9
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = @"This;is:a.test";
            char sep0 = ';', sep1 = ':', sep2 = '.';
            string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
            Regex regex = new Regex(pattern);
            MatchCollection matches = regex.Matches(input);
            List<string> parts=new List<string>();
            foreach (Match match in matches)
            {
                parts.Add(match.ToString());
            }
        }
    }
}
Øyvind Skaar
  • 1,842
  • 14
  • 22
0

I wrote this code to split and keep delimiters:

private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
    var tokens = new List<string>();
    int idx = 0;
    for (int i = 0; i < toSplit.Length; ++i)
    {
        if (delimiters.Contains(toSplit[i]))
        {
            tokens.Add(toSplit.Substring(idx, i - idx));  // token found
            tokens.Add(toSplit[i].ToString());            // delimiter
            idx = i + 1;                                  // start idx for the next token
        }
    }

    // last token
    tokens.Add(toSplit.Substring(idx));

    if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
    {
        tokens = tokens.Where(token => token.Length > 0).ToList();
    }

    return tokens.ToArray();
}

Usage example:

string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
    Console.WriteLine(token);
}
Shadi Serhan
  • 309
  • 3
  • 9
0

If you want to keep the Delimiters like the in Java you use the true keyword

(this is java code): new StringTokenizer(originalString,";,:.-", true)

But if you want it in C#. use this:

string originalString = "10:11:12,13"; string[] parts = Regex.Split(originalString, @"(?<=[;,:.-])|(?=[;,:.-])"); the Output would be 10 : 11 : 12 , 13

Jebjosh
  • 83
  • 8
0

I've come up with a little more versatile version that allows splitting by string or char or both combined. The trick to searching by string or character is to use the rounded and square brackets appropriately ?=(STRING|STRING|[abc]) where a, b, and c are individual characters to match on. The function also takes optional parameters;

  • keepContentAfterDelimiter: Determines if the line value is the data before or after the delimiter.
  • clearEmptyValues: Determines if empty values should be removed from the result.
  • clearLeftoverUndelimitedValues: Determines what to keep if a split causes data to be chunked without a delmiter. For example. Splitting the string "My dogs name is Jugs" with a delimiter of name and keepContentAfterDelimiter set to true would cause the following results;
      1. My dogs
      1. name is Jugs

clearLeftoverUndelimitedValues determins if the first result without a delmiter ("My dogs") should be kept. Here is my string extension.

public static class StringExtensions
{
    /// <summary>
    /// Split that keeps delimiters with the result lines.
    /// </summary>
    /// <param name="inputString"></param>
    /// <param name="delimiters">A list of string or characters to split on.</param>
    /// <param name="keepContentAfterDelimiter">Determines if the line value is the data before or after the delimiter. </param>
    /// <param name="clearEmptyValues"></param>
    /// <param name="clearLeftoverUndelimitedValues">Determines what to keep if a split causes data to be chunked without a delmiter.</param>
    /// <returns></returns>
    public static List<string> SplitAndKeepDelimiters(this string inputString, List<string> delimiters, bool keepContentAfterDelimiter = true, bool clearEmptyValues = true, bool clearLeftoverUndelimitedValues = true)
    {
        if (string.IsNullOrEmpty(inputString))
            return null;

        string splitBeforeOrAfter = !keepContentAfterDelimiter ? "<" : string.Empty;

        List<string> regexDelimiters = new List<string>();

        // We need to convert single characters into a regex array check [].
        string characterDelimiters = string.Join(string.Empty, delimiters.Where(d => d.Length == 1));
        if (!string.IsNullOrEmpty(characterDelimiters))
        {
            regexDelimiters = delimiters.Where(d => d.Length != 1).ToList(); // remove any single character delmiters.
            regexDelimiters.Add($"[{characterDelimiters}]"); // move single character delimeters into a char array check [].
        }
        else
            regexDelimiters = delimiters;

        bool clearDeilimiterOnlyValues = true;
        return Regex.Split(inputString, $@"(?{splitBeforeOrAfter}=({string.Join("|", regexDelimiters)}))")
            .Where(s =>
                (clearEmptyValues ? !string.IsNullOrEmpty(s) : true) // Remove any empty elements.
                && (clearDeilimiterOnlyValues ? !delimiters.Contains(s) : true) // Remove any elements that are delimiter only.
                && (clearLeftoverUndelimitedValues ? delimiters.Any(d => !keepContentAfterDelimiter ? s.EndsWith(d) : s.StartsWith(d)) : true)).ToList();
    }
}

Use it like so;

string originalString = "VALUE1=5VALUE2=6.HelloWorld";
List<string> parts = originalString.SplitAndKeepDelimiters(new() { "VALUE1=", "VALUE2", "." });
// [0] VALUE1=5
// [1] VALUE2=6
// [2] .HelloWorld

parts = originalString.SplitAndKeepDelimiters(new() { "5", "6", "HelloWorld" }, false);
// [0] VALUE1=5
// [1] VALUE2=6
// [2] .HelloWorld
clamchoda
  • 4,411
  • 2
  • 36
  • 74