Best way to specify whitespace in a String.Split operation

Question

I am splitting a string based on whitespace as follows:

string myStr = "The quick brown fox jumps over the lazy dog";

char[] whitespace = new char[] { ' ', '\t' };
string[] ssizes = myStr.Split(whitespace);

It's irksome to define the char[] array everywhere in my code I want to do this. Is there more efficent way that doesn't require the creation of the character array (which is prone to error if copied in different places)?

If I understand this correctly this will only search for a space, not generic whitespace — , May 24 '11 at 13:45
See also possible duplicate, but these later answers have SplitStringOptions. http://stackoverflow.com/questions/1562981/splitting-a-string-at-all-whitespace — goodeye, Feb 01 '16 at 01:25

score 537 · Accepted Answer · edited Aug 28 '20 at 08:38

537

If you just call:

string[] ssize = myStr.Split(null); //Or myStr.Split()

or:

string[] ssize = myStr.Split(new char[0]);

then white-space is assumed to be the splitting character. From the string.Split(char[]) method's documentation page.

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

Always, always, always read the documentation!

edited Aug 28 '20 at 08:38

AZ_

21,688
25
143
191

answered May 24 '11 at 13:43

jason

236,483
35
423
525

3

The trouble with splitting by whitespace is if you have to put it together again, you don't know which whitespace character to put back. – Ross Presser Nov 06 '12 at 21:45
27

`(char[])null` is slightly better as it avoids creating a new object. (You can't use `null` with any of the `options` overloads). – Artfunkel Jul 07 '13 at 12:24
6

@RossPresser: Putting a string back together is a completely different problem, so I wouldn't say this is a problem here. But if all you need to do is put the string back together exactly how it was before, then perhaps better just keep the original. – stakx - no longer contributing Aug 24 '13 at 07:56
Not necessarily EXACTLY like it was. Something like "capitalize the first character of each word, unless it matches this stopword list" is a good example for where you need the original split characters. – Ross Presser Aug 24 '13 at 19:26
7

Stupid question, but if you use `null`, do you still need to specify the `StringSplitOption.RemoveEmptyEntries` or are they ignored by default? – yu_ominae Nov 11 '13 at 20:50
1

StringSplitOptions.None is the default – mhand Aug 23 '14 at 00:46
2

@RossPresser: Since String.Split does not provide any mechanism for keeping track of the characters used to split the string, your observation is not relevant: one cannot achieve what you seek using String.Split, so that requires a different Q&A. – ToolmakerSteve Oct 05 '15 at 23:58
Thank you for that. In some of my strings the whitespace apparently was not a ' '-character and thus the strings were not split correctly. Using this method all whitespaces were recognized correctly. – Eugen Timm Apr 21 '16 at 10:54
That's not an answer. What do you pass to `IndexOf` or `TrimStart` then ? – v.oddou Jul 06 '16 at 06:24
2

string[] ssize = myStr.Split(new char[0], StringSplitOptions.RemoveEmptyEntries) will remove empty entries too. – schauhan Oct 07 '16 at 13:27
can we use above somehow to include whitespaces and other character like a , ? – – Muds Jan 29 '18 at 17:20

Michael Sorens · Answer 2 · 2013-08-24T07:51:49.573

256

Yes, There is need for one more answer here!

All the solutions thus far address the rather limited domain of canonical input, to wit: a single whitespace character between elements (though tip of the hat to @cherno for at least mentioning the problem). But I submit that in all but the most obscure scenarios, splitting all of these should yield identical results:

string myStrA = "The quick brown fox jumps over the lazy dog";
string myStrB = "The  quick  brown  fox  jumps  over  the  lazy  dog";
string myStrC = "The quick brown fox      jumps over the lazy dog";
string myStrD = "   The quick brown fox jumps over the lazy dog";

String.Split (in any of the flavors shown throughout the other answers here) simply does not work well unless you attach the RemoveEmptyEntries option with either of these:

myStr.Split(new char[0], StringSplitOptions.RemoveEmptyEntries)
myStr.Split(new char[] {' ','\t'}, StringSplitOptions.RemoveEmptyEntries)

As the illustration reveals, omitting the option yields four different results (labeled A, B, C, and D) vs. the single result from all four inputs when you use RemoveEmptyEntries:

String.Split vs Regex.Split

Of course, if you don't like using options, just use the regex alternative :-)

Regex.Split(myStr, @"\s+").Where(s => s != string.Empty)

edited Aug 24 '13 at 07:51

answered Aug 23 '13 at 19:23

Michael Sorens

35,361
26
116
172

1

As I mentioned in a previous comment, sometimes you want to put the string back together again after acting on the non-whitespace elements. In such a case it is usually WRONG to treat successive whitespace as a single delimiter. – Ross Presser Aug 26 '13 at 20:45
4

I think, @RossPresser, that that is covered by my qualifier "under all but the most obscure scenarios" because even when wanting to recombine the elements I would be hard-pressed to have a case where I care about multiple spaces. I would want a canonical form--one space between each. So I respectfully disagree--it would be "rarely wrong" rather than "usually wrong". – Michael Sorens Aug 26 '13 at 21:23
1

`CapitalizeEveryWord("This is line one.\n \nThis is line three.")` – Ross Presser Aug 27 '13 at 18:49
4

If you truly think that this is obscure, then I guess we'll have to agree to disagree, but if I left this function out of my software I'd lose my job. Users like their content to look the way they want it to look. – Ross Presser Aug 27 '13 at 18:55
Ah, well, I was a bit too-focused on "space" rather than "whitespace"; thanks for the nudge, @Ross. Though, since my comment just above does use "space" I stand by that as valid :-)... but as you point out it is _not_ as valid for whitespace in general. – Michael Sorens Aug 27 '13 at 19:31
7

This should be an accepted answer, since it is much more complete. – Dennis Apr 24 '15 at 17:09
Absolutely clutch response. Thank you. – Iofacture Dec 12 '18 at 00:23
1

I am wondering why you added `.Where(s => s != string.Empty)` to the Regex. Since you specify `\s+` (any number of spaces) there can be no empty item in between. – Jack Miller Feb 19 '19 at 06:25
1

@JackMiller That removes the items generated when leading and/or trailing whitespace is present. – Michael Sorens Feb 26 '19 at 21:54
I find frequently myself dealing with code that cares about the nth string of the split input. Without the 'RemoveEmptyEntries' I would get a space where I was expecting non-space characters. – MickeyfAgain_BeforeExitOfSO May 01 '19 at 17:50
I really liked your answer (thanks!), nevertheless @RossPresser 's observation (in the first comment) is quite relevant. – luizfls Sep 20 '20 at 04:11

ageektrapped · Answer 3 · 2011-05-24T13:54:31.767

48

According to the documentation :

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

So just call myStr.Split(); There's no need to pass in anything because separator is a params array.

edited May 24 '11 at 13:54

answered May 24 '11 at 13:43

ageektrapped

14,482
7
57
72

score 13 · Answer 4 · answered May 24 '11 at 13:42

13

Why dont you use?:

string[] ssizes = myStr.Split(' ', '\t');

answered May 24 '11 at 13:42

Renatas M.

11,694
1
43
62

2

There is no Split overload that takes two chars. – takrl May 24 '11 at 13:51
2

@takrl: Look [here](http://msdn.microsoft.com/en-us/library/b873y76a%28v=vs.80%29.aspx) public string[] Split (params char[] separator) .NET v2 – Renatas M. May 24 '11 at 14:02
Yes, this takes a character array. Your code snippet passes two single characters. – takrl May 24 '11 at 14:04
19

@takrl: do you know what [params](http://msdn.microsoft.com/en-us/library/w5zay9db%28v=vs.71%29.aspx) keyword is??? – Renatas M. May 24 '11 at 14:06
Pretty cool, +1 for that. Probably the person who downvoted didn't know either. – takrl May 24 '11 at 14:09

score 3 · Answer 5 · answered Jan 30 '13 at 17:34

Note that adjacent whitespace will NOT be treated as a single delimiter, even when using String.Split(null). If any of your tokens are separated with multiple spaces or tabs, you'll get empty strings returned in your array.

From the documentation:

Each element of separator defines a separate delimiter character. If two delimiters are adjacent, or a delimiter is found at the beginning or end of this instance, the corresponding array element contains Empty.

score 2 · Answer 6 · edited Dec 27 '21 at 10:38

2

You can use

var FirstString = YourString.Split().First();

to split a string and get its first occurrence before the space.

edited Dec 27 '21 at 10:38

SaeX

17,240
16
77
97

answered Dec 31 '19 at 16:10

Haxer

21
3

score 2 · Answer 7 · answered May 24 '11 at 13:45

2

So don't copy and paste! Extract a function to do your splitting and reuse it.

public static string[] SplitWhitespace (string input)
{
    char[] whitespace = new char[] { ' ', '\t' };
    return input.Split(whitespace);
}

Code reuse is your friend.

answered May 24 '11 at 13:45

Tim Rogers

21,297
6
52
68

Daren Thomas · Answer 8 · 2011-05-24T15:29:38.697

1

Why don't you just do this:

var ssizes = myStr.Split(" \t".ToCharArray());

It seems there is a method String.ToCharArray() in .NET 4.0!

EDIT: As VMAtm has pointed out, the method already existed in .NET 2.0!

edited May 24 '11 at 15:29

answered May 24 '11 at 13:42

Daren Thomas

67,947
40
154
200

This method is in .NET 2.0!!! http://msdn.microsoft.com/en-us/library/ezftk57x(VS.80).aspx – VMAtm May 24 '11 at 13:45

score 1 · Answer 9 · answered May 24 '11 at 13:44

Can't you do it inline?

var sizes = subject.Split(new char[] { ' ', '\t' });

Otherwise, if you do this exact thing often, you could always create constant or something containing that char array.

As others have noted you can according to the documentation also use null or an empty array. When you do that it will use whitespace characters automatically.

var sizes = subject.Split(null);

score 0 · Answer 10 · answered May 24 '11 at 13:46

0

If repeating the same code is the issue, write an extension method on the String class that encapsulates the splitting logic.

answered May 24 '11 at 13:46

Xhalent

3,914
22
21

1

This doesn't really answer the question, sorry. – p.campbell Aug 02 '13 at 15:09
p. campbell: Yes it does: OP asked for a solution that doesn't require copying the character array everywhere. An obvious solution is to create a function to do the task. This answer points out that such a function could be an extension method. (The answer could be improved, by showing the code to do so...) – ToolmakerSteve Oct 06 '15 at 00:07

score -2 · Answer 11 · answered May 24 '11 at 13:42

-2

You can just do:

string myStr = "The quick brown fox jumps over the lazy dog";
string[] ssizes = myStr.Split(' ');

MSDN has more examples and references:

http://msdn.microsoft.com/en-us/library/b873y76a.aspx

answered May 24 '11 at 13:42

Tom Gullen

61,249
84
283
456

Best way to specify whitespace in a String.Split operation

11 Answers11

Linked

Related