You should note that much depends on how you define "whitespace". Unicode and the CLR define whitespace as being a rather exhaustive list of characters: char.IsWhitespace()
return true for quite a few characters.
The "classic" definition of whitespace are the following characters: HT, LF, VT, FF, CR and SP (and some might include BS as well).
Myself, I'd probably do something like this:
public static class StringHelpers
{
public static string StripWhitespace( this string s )
{
StringBuilder sb = new StringBuilder() ;
foreach ( char c in s )
{
switch ( c )
{
//case '\b' : continue ; // U+0008, BS uncomment if you want this
case '\t' : continue ; // U+0009, HT
case '\n' : continue ; // U+000A, LF
case '\v' : continue ; // U+000B, VT
case '\f' : continue ; // U+000C, FF
case '\r' : continue ; // U+000D, CR
case ' ' : continue ; // U+0020, SP
}
sb.Append(c) ;
}
string stripped = sb.ToString() ;
return stripped ;
}
}
You could use your approach thusly. However, it's important to READ THE DOCUMENTATION): you'll note the use of a string
constructor overload that lets you specify a range within an array as the initialization vector for the string:
public static string StripWhitespace( string s )
{
char[] buf = s.ToCharArray() ;
int j = 0 ; // target pointer
for ( int i = 0 ; i < buf.Length ; ++i )
{
char c = buf[i] ;
if ( !IsWs(c) )
{
buf[j++] = c ;
}
}
string stripped = new string(buf,0,j) ;
return stripped ;
}
private static bool IsWs( char c )
{
bool ws = false ;
switch ( c )
{
//case '\b' : // U+0008, BS uncomment if you want BS as whitespace
case '\t' : // U+0009, HT
case '\n' : // U+000A, LF
case '\v' : // U+000B, VT
case '\f' : // U+000C, FF
case '\r' : // U+000D, CR
case ' ' : // U+0020, SP
ws = true ;
break ;
}
return ws ;
}
You could also use Linq, something like:
public static string StripWhitespace( this string s )
{
return new string( s.Where( c => !char.IsWhiteSpace(c) ).ToArray() ) ;
}
Though, I'm willing to be that the Linq approach will be significantly slower than the other two. It's elegant, though.