0

With any of the following variants all white-space characters are defined as splitting character:

string[] words = phrase.Split(null); 
string[] words = phrase.Split(new char[0]); 
string[] words = phrase.Split((char[])null); 
string[] words = phrase.Split(default(Char[])); 
string[] words = phrase.Split(null as char[]);

But is there also a way to define whitespace AND additional separator characters like comma ( , ) or hyphen (-) without nested calls of String.Split and without explicitly defining each of the white space characters ?

The goal is high performance, rather than concise code

Wolf Garbe
  • 184
  • 1
  • 7
  • Splitting into a newly allocated array is inherently not a very fast way of doing business; that would be leaving the input alone and parsing through it with indexes. Even so, a `Regex` can easily split by whatever you please, including all whitespace (`\s`)/all separators (`\p{Z}`), plus any custom characters. – Jeroen Mostert Jan 21 '20 at 15:19
  • @JeroenMostert: I'm not inclined to call regex "high-performance." – Robert Harvey Jan 21 '20 at 15:20
  • @RobertHarvey: That would depend inherently on the regex and the implementation thereof (compiled regexes can be plenty fast, although I have no idea if the .NET engine in particular is any good). And as always, actual acceptable performance is in the eye of the benchmarker; the OP wanting to go for "high performance" is horribly unspecific. As it stands allocation is probably a far bigger perf killer than the splitting method employed. – Jeroen Mostert Jan 21 '20 at 15:21
  • enumerating every whitespaces along with the others characters in a `char[]` *could* be the fastest way (fastest in performances, not in typing time), however, I would tend to rather use a simple regex such as `[\s,-]` – Cid Jan 21 '20 at 15:24
  • https://stackoverflow.com/questions/1254577/string-split-by-multiple-character-delimiter – Ben Jan 21 '20 at 15:26

2 Answers2

3

You could use Regex.Split instead, which allows you to split on a Regex pattern:

string[] words = Regex.Split(phrase, "[\s,-]");

The pattern [\s,-] will split on any whitespace (\s), or a comma or hyphen literal (,,-)

Be sure to add a reference to System.Text.RegularExpressions

using System.Text.RegularExpressions;
Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Zaelin Goodman
  • 896
  • 4
  • 11
3

As the official documentation says:

Performance Considerations

The Split methods allocate memory for the returned array object and a String object for each array element. If your application requires optimal performance or if managing memory allocation is critical in your application, consider using the IndexOf or IndexOfAny method, and optionally the Compare method, to locate a substring within a string.

If you need high performance you should use Span<string> and the methods IndexOf and Slice.

Community
  • 1
  • 1
ganchito55
  • 3,559
  • 4
  • 25
  • 46