26

The definition of Substring() method in .net System.String class is like this

public string Substring(int startIndex)

Where startIndex is "The zero-based starting character position of a substring in this instance" as per the method definition. If i understand it correctly, it means it will give me a part of the string, starting at the given zero-based index.

Now, if I have a string "ABC" and take substring with different indexes, I get following results.

var str = "ABC";
var chars = str.ToArray(); //returns 3 char 'A', 'B', 'C' as expected

var sub2 = str.Substring(2); //[1] returns "C" as expected
var sub3 = str.Substring(3); //[2] returns "" ...!!! Why no exception??
var sub4 = str.Substring(4); //[3] throws ArgumentOutOfRangeException as expected

Why it doesn't throw exception for case [2] ??

The string has 3 characters, so indexes are [0, 1, 2], and even ToArray(), ToCharArray() method returns 3 characters as expected! Shouldn't it throw exception if I try to Substring() with starting index 3?

Arghya C
  • 9,805
  • 2
  • 47
  • 66
  • Might be a `\0` character (to mark the end of a string). But I am not sure if .NET uses that. Worth a google though – Stefan Oct 02 '15 at 11:37
  • 12
    line 1246 @ http://referencesource.microsoft.com/#mscorlib/system/string.cs,1246 – Alex K. Oct 02 '15 at 11:40
  • 2
    Thanks @AlexK. and others (answers) for pointing out the implementation and MSDN documentation. I can see this is how the framework team has implemented this, but to me (and few others I guess) this is kind of unexpected! – Arghya C Oct 02 '15 at 11:49
  • @immibis Phylogenesis gives a similar analogy in his answer. – Arghya C Oct 03 '15 at 05:00
  • @ArghyaC : This is exactly what you want a tokenizer to do: while not (empty string remains), suck up and handle the next part of the string. How would you have the tokenizer realize it has reached the end of the string? That's not an *exceptional* thing to have happen. – Eric Towers Oct 03 '15 at 21:17
  • 1
    the quick and dirty answer is: .NET knows what a 0-length string means, it doesn't know what a -1 length string means. – Michael Edenfield Oct 03 '15 at 22:02
  • Though the referred post didn't come up in my search before posting, currently *this* post has much better and well documented answers on the topic I see. And *this* should NOT be a duplicate as per http://meta.stackoverflow.com/questions/252929/which-question-is-the-better-reference-for-a-duplicate and http://meta.stackoverflow.com/questions/251938/should-i-flag-a-question-as-duplicate-if-it-has-received-better-answers – Arghya C Oct 04 '15 at 13:55
  • @ArghyaC: what do you mean "well documented"? The other question (of which this is a duplicate) includes a link to the documention _in the question_. I.e. the author of that question actually did more research than the one here, and provided the mostly highly up-voted answer here, **in his question**. I agree this question wound up with a lot more rehash of the basic answer, but "better"? That seems a stretch. – Peter Duniho Oct 04 '15 at 18:33
  • @PeterDuniho If you please read it correctly, I wrote *"well documented answers"*, not *"question"*. Agree, the other question has link to MSDN doc, which I didn't have, but the answers here have better (the word, better, itself is controversial and opinion-based, though) explanation IMHO. Marking this one duplicate will redirect users to the other post only. And, you could have a look at Mr. Atwood's post here maybe? https://blog.stackoverflow.com/2010/11/dr-strangedupe-or-how-i-learned-to-stop-worrying-and-love-duplication/ Thanks! – Arghya C Oct 04 '15 at 19:37
  • _" I wrote "well documented answers", not "question""_ -- yes, you did write that. But that's a red herring and false criticism. When the question itself contains the documentation you complain doesn't exist in the answer, there's no need for the documentation to exist in the answer. – Peter Duniho Oct 04 '15 at 19:41
  • @PeterDuniho The link is not everything about the answer(s). Like the other user, I also *did read the doc and try out the related code* before posting here. I posted here looking for better explanation and reasoning (which I got). I'm just saying, if this post stays here without a redirect (read duplicate), more users will find these answers which are of good quality. I'd refer to Jeff Atwood's very well written blog on duplicate posts again! – Arghya C Oct 04 '15 at 19:53
  • 1
    Whether this question is closed as a duplicate has absolutely no effect on whether users can find the answers here. Indeed, if you were reading Atwood (and other's) comments about duplicates closely, key to the concept of "embracing duplicates" is that questions _are still closed as duplicate_. It's simply that retaining them assists users in finding the answers they want or need. – Peter Duniho Oct 04 '15 at 22:10

8 Answers8

51

The documentation is quite explicit about this being correct behaviour:

Return value: a string that is equivalent to the substring that begins at startIndex in this instance, or Empty if startIndex is equal to the length of this instance.

Throws ArgumentOutOfRangeException if startIndex is less than zero or *greater than the length of this instance. *

In other words, taking a substring starting just beyond the final character will give you an empty string.

Your comment that you expected it to give you a part of the string is not incompatible with this. A "part of the string" includes the set of all substrings of zero length as well, as evidenced by the fact that s.substring(n, 0) will also give an empty string.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • I can see this is the "implemented" behavior, but isn't it unexpected and confusing? – Arghya C Oct 02 '15 at 11:43
  • 7
    @ArghyaC, only to some, apparently :-) See my final paragraph. Since a substring can include zero-width entities *between* characters (if you ask for a length of zero), it makes sense that you can also get the zero-width entity after the final character. – paxdiablo Oct 02 '15 at 11:46
  • 1
    With that last paragraph, it kind of makes sense (in some way). Then it goes to `null` vs `string.Empty` direction. But thanks for the explanation :) – Arghya C Oct 02 '15 at 11:57
  • 1
    i was thinking why they decided to return empty string in this case.... `s.substring(n, 0)` answered my question ! – M.kazem Akhgary Oct 02 '15 at 12:42
  • Maybe this comes from C behaviour, where strings are null-terminated (the last character is `\0`)? – Ismael Miguel Oct 02 '15 at 18:09
  • 1
    @IsmaelMiguel Apparently that's not the case with `strings` in `C#`, see the **Strings and embedded null characters** section in this MSDN documentation https://msdn.microsoft.com/en-us/library/system.string.aspx – Arghya C Oct 02 '15 at 18:31
23

There are lots of technical answers here saying how the framework handles the method call, but I'd like to give a reasoning by analogy for why it is like it is.

Consider the string as a fence where the fence panels themselves are the characters, held up with fence posts numbered as shown below:

0   1   2   3
| A | B | C |   "ABC"

0   1   2   3   4   5   6   7   8   9
| M | y |   | S | t | r | i | n | g |   "My String"

In this analogy, string.Substring(n) returns a string of panels starting with fencepost n. Notice that the last character of the string has a fence post after it. Calling the function with this fence post returns a value stating there are no fence panels after this point (ie. it returns the empty string).

Similarly, string.Substring(n, l) returns a string of l panels starting with fencepost n. This is why something like "ABC".Substring(2, 0) returns "", too.

Phylogenesis
  • 7,775
  • 19
  • 27
  • +1 I think a lot of pointer- and index-related concepts work best if one regards pointers and indices as identifying the spaces between items rather than identifying items themselves. The first item sits between index 0 and index 1; the second sits between 1 and 2, etc. In the case of strings, sometimes it's useful to regard strings as being followed by an infinite number of fenceposts with nothing between them (so in many versions of BASIC, for example, `mid("Hello",23,1)` will perfectly happily return an empty string). I wish language/framework authors would routinely include both... – supercat Oct 02 '15 at 17:23
  • ...methods which trap whenever there aren't enough fenceposts as well as methods which would happily return shorter or empty strings. Sometimes code wants to say "I need exactly 5 characters starting at index 9", but sometimes what's needed is "I need up to 5 characters starting at index 9, if the string extends that far". Both operations are needed often enough, IMHO, that it's worth having separate methods for both. – supercat Oct 02 '15 at 17:27
  • This is a nice analogy. But, generally `index` behaves as a pointer to one memory location/item in the array, not a location between items. Isn't it? Why it'd behave like this only for `Substring`? If this was the general behavior, then `"ABC".ToArray()[3]` shouldn't throw `IndexOutOfRangeException` IMHO. – Arghya C Oct 03 '15 at 04:57
  • @ArghyaC I guess that's just the difference between indexing an individual `char` (of which there is no concept of the empty character - value types cannot be `null`), and those methods that return a `string` (of which there is). Or, to get back to the analogy, `.toArray()` returns a pile of panels without their supporting fenceposts. – Phylogenesis Oct 05 '15 at 07:37
13

Sometimes looking at the code can be handy :

First this is called :

public string Substring(int startIndex)
{
    return this.Substring(startIndex, this.Length - startIndex);
}

The length is 0 due to substraction of value :

public string Substring(int startIndex, int length)
{
    if (startIndex < 0)
    {
        throw new ...
    }
    if (startIndex > this.Length)
    {
        throw new ...
    }
    if (length < 0)
    {
        throw new ...
    }
    if (startIndex > (this.Length - length))
    {
         throw new ...
    }
    if (length == 0) // <-- NOTICE HERE
    {
        return Empty;
    }
    if ((startIndex == 0) && (length == this.Length))
    {
        return this;
    }
    return this.InternalSubString(startIndex, length);
}
Royi Namir
  • 144,742
  • 138
  • 468
  • 792
  • 1
    This code shows this is pretty much a "decided" behavior. For `Substring(n, 0)` returning `string.Empty` is almost obvious, but for `Substring(lastIndex + 1)`? Not so much, IMHO. But then, it'll be quite an opinionated debate :) – Arghya C Oct 02 '15 at 19:04
  • 1
    I agree . Pretty weird – Royi Namir Oct 02 '15 at 19:06
  • `s.Substring(n)` returns `s.Substring(n, s.Length - n)`. So `s.Substring(lastIndex + 1)` means exactly `(s.Substring(lastIndex + 1, 0)`... – Michael Edenfield Oct 03 '15 at 22:02
  • @MikeEdenfield For `(s.Substring(lastIndex + 1, 0)` it returns an empty string, that we all know. Now, the method has 2 parameters, `startIndex = lastIndex + 1` and `length = 0`. This behavior justifies for the second parameter. But, doesn't it look like it simply disregards the first parameter? Otherwise, why should it throw exception for `(s.Substring(lastIndex + n, 0)` where n > 1 ? – Arghya C Oct 04 '15 at 05:32
  • I admit that it's less obvious why, but I believe that makes sense. The Framework is *conceptually* doing two checks (in practice, the implementation is short-cutting most of them): 1. is there a string for me to take the substring of, and 2. how long a substring do I take? If you ask for `SubString(length, 0)`, then step 1 says "yes, the empty string", and step two says "the 0-length string". Those are fine. But if you ask for `Substring(length + 1, 0)`, then step one says "no, there's no string here to work with." – Michael Edenfield Oct 04 '15 at 11:29
4

Based on what is written on MSDN:

*

Return Value - A string that is equivalent to the substring that begins at startIndex in this instance, or Empty if startIndex is equal to the length of this instance.

Exceptions ArgumentOutOfRangeException - startIndex is less than zero or greater than the length of this instance

*

Vasil Indzhev
  • 635
  • 1
  • 7
  • 17
4

Looking at the String.Substring Method documentation, an empty string will be returned if the start index is equal to the length.

A string that is equivalent to the substring of length length that begins at startIndex in this instance, or Empty if startIndex is equal to the length of this instance and length is zero.

Community
  • 1
  • 1
Stuart
  • 754
  • 11
  • 25
2

What Substring does is that it checks if startIndex is greater than the length of the string and only then it throws the exception. In your case it is equal (the string length is 3). After that it checks if the length of the substring is zero and if it is returns String.Empty. In your case the length of the substring is the length of the string (3) minus the startIndex (3). This is why the length of the substring is 0 and an empty string is returned.

DimitarD
  • 117
  • 1
  • 5
1

All strings in C# in the end have String.Empty.

Here is good answer on this question.

From MSDN - String Class (System):

In the .NET Framework, a String object can include embedded null characters, which count as a part of the string's length. However, in some languages such as C and C++, a null character indicates the end of a string; it is not considered a part of the string and is not counted as part of the string's length.

Community
  • 1
  • 1
teo van kot
  • 12,350
  • 10
  • 38
  • 70
  • 6
    This is just so wrong to say `all strings in the end have ""`. It's like saying what all arrays in the end have one more item, which is array (!) and contains no items (!). Linked answer uses word **matches**, not **have**. What `Substring` does when 0 length string is requested is obvious - empty string is returned, but it's not because it's in the end of string or something like this. – Sinatr Oct 02 '15 at 11:53
  • @Sinatr we know this only after we decompile library – teo van kot Oct 02 '15 at 11:54
  • 3
    Nope, we know for sure what string doesn't have `""` at the end. In `C#` this is true:`"some string" == "some string" + ""`, but it's not because `""` is added (and ignored during comparison) or exists at the end. It's because **nothing happens** when you operate with `""`. `String.Empty` is a special case and will be returned by string operating methods when 0 length string is a result of operation. – Sinatr Oct 02 '15 at 12:01
  • @Sinatr your comments explain some good points. Why don't you add as an answer, will be good if someone comes to this post in future. – Arghya C Oct 02 '15 at 12:18
  • 2
    @ArghyaC, answer of Royi Namir is already the perfect answer to your question. My comments here are related to this answer, which is wrong in my opinion (and should be deleted imho with all my comments addressed to the author). – Sinatr Oct 02 '15 at 12:25
1

To supplement other answers, Mono also correctly implements this behavior.

public String Substring (int startIndex)
{
    if (startIndex == 0)
        return this;
    if (startIndex < 0 || startIndex > this.length)
        throw new ArgumentOutOfRangeException ("startIndex");

    return SubstringUnchecked (startIndex, this.length - startIndex);
}

// This method is used by StringBuilder.ToString() and is expected to
// always create a new string object (or return String.Empty). 
internal unsafe String SubstringUnchecked (int startIndex, int length)
{
    if (length == 0)
        return String.Empty;

    string tmp = InternalAllocateStr (length);
    fixed (char* dest = tmp, src = this) {
        CharCopy (dest, src + startIndex, length);
    }
    return tmp;
}

As you can see, it returns String.Empty if the length is equal to zero.

Furkan Omay
  • 1,047
  • 1
  • 11
  • 22
  • 1
    That's a nice compact implementation in `Mono`. And yes, it is functionally similar to the `FCL` implementations. – Arghya C Oct 02 '15 at 18:49