1

Background

I am working with a delimited string and was using String.Split to put each substring into an array when I noticed that the last spot in the array was "". It was throwing off my results since I was looking for a specific substring at the last index in the array and I eventually came across this post explaining all strings end with string.Empty.

Example

The following shows this behavior in action. When I split my sentence and write each substring to the console, we can see the last element is the empty string:

public class Program
{
    static void Main(string[] args)
    {
        const string mySentence = "Hello,this,is,my,string!";
        var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.None);

        foreach (var word in wordArray)
        {
            var message = word;
            if (word == string.Empty) message = "Empty string";
            Console.WriteLine(message);
        }         

        Console.ReadKey();
    }
}

Question & "Fix"

I get conceptually that there are empty strings between every character, but why does String behave like this even for the end of a string? It seems confusing that "ABC" is equivalent to "ABC" + "" or ABC + "" + "" + "" so why not treat the string literally as only "ABC"? There is a "fix" around it to get the "true" substrings I wanted:

public class Program
{
    static void Main(string[] args)
    {
        const string mySentence = "Hello,this,is,my,string!";
        var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.None);

        var wordList = new List<string>();
        wordList.AddRange(wordArray);
        wordList.RemoveAt(wordList.LastIndexOf(string.Empty));

        foreach (var word in wordList)
        {
            var message = word;
            if (word == string.Empty) message = "Empty string";
            Console.WriteLine(message);
        }

        Console.ReadKey();
    }
}

But I still don't understand why the end of the string gets treated with the same behavior since there is not another character following it where an empty string would be needed. Does it serve some purpose for the compiler?

Community
  • 1
  • 1
Ryan Intravia
  • 420
  • 6
  • 12
  • 1
    There may be cases where the difference between `"Hello,this,is,my,string!"` and `"Hello,this,is,my,string"` is important. – Matt Jul 25 '16 at 23:28
  • 4
    The end of the string is not special. `"!string".Split(new[] {"!"}, StringSplitOptions.None)` would be the same. Use `StringSplitOptions.RemoveEmptyEntries` to suppress empty strings. – Mathias R. Jessen Jul 25 '16 at 23:29
  • 2
    You are splitting on "!" so why do you not expect to get an extra entry? – DavidG Jul 25 '16 at 23:30
  • or just `mySentence.TrimEnd('!').Split(',')` – Slai Jul 25 '16 at 23:31
  • Is it also confusing that 1 + 0 = 1 or 1 + 0 + 0 = 1. Or that 2 x 1 = 2? 2 x 1 x 1 = 2? Is it confusing that the product of a matrix and an identity matrix (that it can be multiplied by) is equal to the first matrix? Many operations have identity operations - for string concatenation that operation is concatenating the empty string. – moreON Jul 25 '16 at 23:32
  • Type the following code in and see what it returns. It might give you a better conceptual understanding: `var result = "!!!!!".Split('!');` – Sam I am says Reinstate Monica Jul 25 '16 at 23:32

3 Answers3

4

This is happening because you are using StringSplitOptions.None while one of your delimiter values occurs at the end of the string. The entire purpose of that option is to create the behavior you are observing: it splits a string containing N delimiters into exactly N + 1 pieces.

To see the behavior you want, use StringSplitOptions.RemoveEmptyEntries:

var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.RemoveEmptyEntries);

As for why you are seeing what you're seeing. The behavior StringSplitOptions.None is to find all the places where the delimiters are in the input string and return an array of each piece before and after the delimiters. This could be useful, for example, if you're parsing a string that you know to have exactly N parts, but where some of them could be blank. So for example, splitting the following on a comma delimiter, they would each yield exactly 3 parts:

a,b,c
a,b,
a,,c
a,,
,b,c
,b,
,,c
,,

If you want to allow empty values between delimiters, but not at the beginning or end, you can strip off delimiters at the beginning or end of the string before splitting:

var wordArray = Regex
    .Replace(mySentence, "^[,!]|[,!]$", "")
    .Split(new[] {",", "!"}, StringSplitOptions.None);
JLRishe
  • 99,490
  • 19
  • 131
  • 169
  • For my actual data, I have empty substrings that are valid so I can't remove them during splitting or this answer would've been perfect. The example was just showing the end of the string. The condition of having valid empty strings vs. the end of the string lead me to wondering the overall "why" the end of the string also contains an empty string. – Ryan Intravia Jul 26 '16 at 00:11
  • 1
    @RyanIntravia I think you're looking at it the wrong way. It's not about the end of the string "containing" an empty string. The behavior of `.Split()` is to find all the delimiters, and return the parts before and after each delimiter. The part after the final `!` is the empty string. It kinda sounds like you want to have your cake and eat it too. You want to get empty strings when there's nothing _between_ delimiters, but not when there's nothing _after_ the final delimiter. These are manifestations of the same idea and you can't have one without the other. – JLRishe Jul 26 '16 at 00:18
  • You're completely right. It's not that it's a hassle to account for and conceptually I get it now looking at the responses here, but I wasn't approaching it from that point when I wrote the question :/ – Ryan Intravia Jul 26 '16 at 00:22
  • @RyanIntravia Updated my answer to add a possible solution to your situation. – JLRishe Jul 26 '16 at 00:32
4

Empty strings are the 0 of strings. There are literally infinity of them everywhere.

It's only natural that "ABC" is equivalent to "ABC" + "" or ABC + "" + "" + "". Just like it's natural that 3 is equivalent to 3 + 0 or 3 + 0 + 0 + 0.

and the fact that you have an empty string after "Hello,this,is,my,string!".Split('!')" does mean something. It means that your string ended with a "!"

  • I was approaching it in a completely different perspective as seen in @JLRishe's post, but this explanation and his comments clear it up. Thank you for the help! – Ryan Intravia Jul 26 '16 at 00:26
0

"" is the gap in-between each letter of Hello,this,is,my,string! So when the string is split by , and ! the result is Hello, this, is, my, string, "". The "" being the empty character between the end of the string and !.

If you replaced "" with a visible character (say #) your string would look like this #H#e#l#l#o#,#t#h#i#s#,#i#s#,#m#y#,#s#t#r#i#n#g#!#.