286

I am splitting a string based on whitespace as follows:

string myStr = "The quick brown fox jumps over the lazy dog";

char[] whitespace = new char[] { ' ', '\t' };
string[] ssizes = myStr.Split(whitespace);

It's irksome to define the char[] array everywhere in my code I want to do this. Is there more efficent way that doesn't require the creation of the character array (which is prone to error if copied in different places)?

John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • 1
    does this: myStr.Split(' '); not work? – woolagaroo May 24 '11 at 13:41
  • 4
    If I understand this correctly this will only search for a space, not generic whitespace –  May 24 '11 at 13:45
  • See also possible duplicate, but these later answers have SplitStringOptions. http://stackoverflow.com/questions/1562981/splitting-a-string-at-all-whitespace – goodeye Feb 01 '16 at 01:25

11 Answers11

537

If you just call:

string[] ssize = myStr.Split(null); //Or myStr.Split()

or:

string[] ssize = myStr.Split(new char[0]);

then white-space is assumed to be the splitting character. From the string.Split(char[]) method's documentation page.

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

Always, always, always read the documentation!

AZ_
  • 21,688
  • 25
  • 143
  • 191
jason
  • 236,483
  • 35
  • 423
  • 525
  • 3
    The trouble with splitting by whitespace is if you have to put it together again, you don't know which whitespace character to put back. – Ross Presser Nov 06 '12 at 21:45
  • 27
    `(char[])null` is slightly better as it avoids creating a new object. (You can't use `null` with any of the `options` overloads). – Artfunkel Jul 07 '13 at 12:24
  • 6
    @RossPresser: Putting a string back together is a completely different problem, so I wouldn't say this is a problem here. But if all you need to do is put the string back together exactly how it was before, then perhaps better just keep the original. – stakx - no longer contributing Aug 24 '13 at 07:56
  • Not necessarily EXACTLY like it was. Something like "capitalize the first character of each word, unless it matches this stopword list" is a good example for where you need the original split characters. – Ross Presser Aug 24 '13 at 19:26
  • 7
    Stupid question, but if you use `null`, do you still need to specify the `StringSplitOption.RemoveEmptyEntries` or are they ignored by default? – yu_ominae Nov 11 '13 at 20:50
  • 1
    StringSplitOptions.None is the default – mhand Aug 23 '14 at 00:46
  • 2
    @RossPresser: Since String.Split does not provide any mechanism for keeping track of the characters used to split the string, your observation is not relevant: one cannot achieve what you seek using String.Split, so that requires a different Q&A. – ToolmakerSteve Oct 05 '15 at 23:58
  • Thank you for that. In some of my strings the whitespace apparently was not a ' '-character and thus the strings were not split correctly. Using this method all whitespaces were recognized correctly. – Eugen Timm Apr 21 '16 at 10:54
  • That's not an answer. What do you pass to `IndexOf` or `TrimStart` then ? – v.oddou Jul 06 '16 at 06:24
  • 2
    string[] ssize = myStr.Split(new char[0], StringSplitOptions.RemoveEmptyEntries) will remove empty entries too. – schauhan Oct 07 '16 at 13:27
  • can we use above somehow to include whitespaces and other character like a , ? – – Muds Jan 29 '18 at 17:20
256

Yes, There is need for one more answer here!

All the solutions thus far address the rather limited domain of canonical input, to wit: a single whitespace character between elements (though tip of the hat to @cherno for at least mentioning the problem). But I submit that in all but the most obscure scenarios, splitting all of these should yield identical results:

string myStrA = "The quick brown fox jumps over the lazy dog";
string myStrB = "The  quick  brown  fox  jumps  over  the  lazy  dog";
string myStrC = "The quick brown fox      jumps over the lazy dog";
string myStrD = "   The quick brown fox jumps over the lazy dog";

String.Split (in any of the flavors shown throughout the other answers here) simply does not work well unless you attach the RemoveEmptyEntries option with either of these:

myStr.Split(new char[0], StringSplitOptions.RemoveEmptyEntries)
myStr.Split(new char[] {' ','\t'}, StringSplitOptions.RemoveEmptyEntries)

As the illustration reveals, omitting the option yields four different results (labeled A, B, C, and D) vs. the single result from all four inputs when you use RemoveEmptyEntries:

String.Split vs Regex.Split

Of course, if you don't like using options, just use the regex alternative :-)

Regex.Split(myStr, @"\s+").Where(s => s != string.Empty)
Michael Sorens
  • 35,361
  • 26
  • 116
  • 172
  • 1
    As I mentioned in a previous comment, sometimes you want to put the string back together again after acting on the non-whitespace elements. In such a case it is usually WRONG to treat successive whitespace as a single delimiter. – Ross Presser Aug 26 '13 at 20:45
  • 4
    I think, @RossPresser, that that is covered by my qualifier "under all but the most obscure scenarios" because even when wanting to recombine the elements I would be hard-pressed to have a case where I care about multiple spaces. I would want a canonical form--one space between each. So I respectfully disagree--it would be "rarely wrong" rather than "usually wrong". – Michael Sorens Aug 26 '13 at 21:23
  • 1
    `CapitalizeEveryWord("This is line one.\n \nThis is line three.")` – Ross Presser Aug 27 '13 at 18:49
  • 4
    If you truly think that this is obscure, then I guess we'll have to agree to disagree, but if I left this function out of my software I'd lose my job. Users like their content to look the way they want it to look. – Ross Presser Aug 27 '13 at 18:55
  • Ah, well, I was a bit too-focused on "space" rather than "whitespace"; thanks for the nudge, @Ross. Though, since my comment just above does use "space" I stand by that as valid :-)... but as you point out it is _not_ as valid for whitespace in general. – Michael Sorens Aug 27 '13 at 19:31
  • 7
    This should be an accepted answer, since it is much more complete. – Dennis Apr 24 '15 at 17:09
  • Absolutely clutch response. Thank you. – Iofacture Dec 12 '18 at 00:23
  • 1
    I am wondering why you added `.Where(s => s != string.Empty)` to the Regex. Since you specify `\s+` (any number of spaces) there can be no empty item in between. – Jack Miller Feb 19 '19 at 06:25
  • 1
    @JackMiller That removes the items generated when leading and/or trailing whitespace is present. – Michael Sorens Feb 26 '19 at 21:54
  • I find frequently myself dealing with code that cares about the nth string of the split input. Without the 'RemoveEmptyEntries' I would get a space where I was expecting non-space characters. – MickeyfAgain_BeforeExitOfSO May 01 '19 at 17:50
  • I really liked your answer (thanks!), nevertheless @RossPresser 's observation (in the first comment) is quite relevant. – luizfls Sep 20 '20 at 04:11
48

According to the documentation :

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

So just call myStr.Split(); There's no need to pass in anything because separator is a params array.

ageektrapped
  • 14,482
  • 7
  • 57
  • 72
13

Why dont you use?:

string[] ssizes = myStr.Split(' ', '\t');
Renatas M.
  • 11,694
  • 1
  • 43
  • 62
3

Note that adjacent whitespace will NOT be treated as a single delimiter, even when using String.Split(null). If any of your tokens are separated with multiple spaces or tabs, you'll get empty strings returned in your array.

From the documentation:

Each element of separator defines a separate delimiter character. If two delimiters are adjacent, or a delimiter is found at the beginning or end of this instance, the corresponding array element contains Empty.

cherno
  • 808
  • 9
  • 15
2

You can use

var FirstString = YourString.Split().First();

to split a string and get its first occurrence before the space.

SaeX
  • 17,240
  • 16
  • 77
  • 97
Haxer
  • 21
  • 3
2

So don't copy and paste! Extract a function to do your splitting and reuse it.

public static string[] SplitWhitespace (string input)
{
    char[] whitespace = new char[] { ' ', '\t' };
    return input.Split(whitespace);
}

Code reuse is your friend.

Tim Rogers
  • 21,297
  • 6
  • 52
  • 68
1

Why don't you just do this:

var ssizes = myStr.Split(" \t".ToCharArray());

It seems there is a method String.ToCharArray() in .NET 4.0!

EDIT: As VMAtm has pointed out, the method already existed in .NET 2.0!

Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
  • This method is in .NET 2.0!!! http://msdn.microsoft.com/en-us/library/ezftk57x(VS.80).aspx – VMAtm May 24 '11 at 13:45
1

Can't you do it inline?

var sizes = subject.Split(new char[] { ' ', '\t' });

Otherwise, if you do this exact thing often, you could always create constant or something containing that char array.

As others have noted you can according to the documentation also use null or an empty array. When you do that it will use whitespace characters automatically.

var sizes = subject.Split(null);
Svish
  • 152,914
  • 173
  • 462
  • 620
0

If repeating the same code is the issue, write an extension method on the String class that encapsulates the splitting logic.

Xhalent
  • 3,914
  • 22
  • 21
  • 1
    This doesn't really answer the question, sorry. – p.campbell Aug 02 '13 at 15:09
  • p. campbell: Yes it does: OP asked for a solution that doesn't require copying the character array everywhere. An obvious solution is to create a function to do the task. This answer points out that such a function could be an extension method. (The answer could be improved, by showing the code to do so...) – ToolmakerSteve Oct 06 '15 at 00:07
-2

You can just do:

string myStr = "The quick brown fox jumps over the lazy dog";
string[] ssizes = myStr.Split(' ');

MSDN has more examples and references:

http://msdn.microsoft.com/en-us/library/b873y76a.aspx

Tom Gullen
  • 61,249
  • 84
  • 283
  • 456