0

Hi I am a beginner in C# and I was trying to remove the whitespaces in a string. I use the following code:

public String RemoveSpace(string str1)
{

    char[] source = str1.ToCharArray();

    int oldIndex = 0;
    int newIndex = 0;
    while (oldIndex < source.Length)
    {
        if (source[oldIndex] != ' ' && source[oldIndex] != '\t')
        {
            source[newIndex] = source[oldIndex];
            newIndex++;
        }
        oldIndex++;
    }
    source[oldIndex] = '\0';
    return new String(source);

}

But the problem I'm facing is when I give the input string as "H e l" the output shows "Hel l" which is because the at the last iteration oldIndex is at arr[2] being replaced by arr[4] and the last character 'l' is being left out. Can some one point out the mistake that is being done? Note: There should not be any use of Regex, trim or replace functions. Thanks.

Cᴏʀʏ
  • 105,112
  • 20
  • 162
  • 194
user1561245
  • 5
  • 1
  • 5

4 Answers4

6

There's a String constructor which allows you to control the length

So just change the last line to

return new String(source, 0, newIndex);

Note that .NET doesn't care about NUL characters (strings can contain them just fine), so you can remove source[oldIndex] = '\0'; since it's ineffective.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
2

Some key learning points:

  • Incrementally concatenating strings is relatively slow. Since you know you're going to be doing a 'lot' (indeterminate) number of character-by-character operations, use a char array for the working string.
  • The fastest way to iterate through characters is C# is to use the built-in string indexer.

If you need to check additional characters besides space, tab, carriage return, and line feed, then add additional conditions in the if statement:

public static string RemoveWhiteSpace(string input) {
    int len = input.Length;
    int ixOut = 0;
    char[] outBuffer = new char[len];
    for(int i = 0; i < len; i++) {
        char c = input[i];
        if(!(c == ' ' || c == '\t' || c == '\r' || c == '\n')) 
            outBuffer[ixOut++] = c;
    } 
    return new string(outBuffer, 0, ixOut);
}
Joshua Honig
  • 12,925
  • 8
  • 53
  • 75
  • He's already doing both things mentioned in your bullet points. `Array.Resize` is a workable way to trim the end, but it also will make an extra unneeded copy. – Ben Voigt Apr 02 '14 at 01:11
  • No, he's not doing the second thing. He's calling `ToCharArray`, which makes a new *copy* of all the characters in the input string. I directly index into the string, which returns the character without creating a char array (of the input). – Joshua Honig Apr 02 '14 at 01:14
  • True, but he isn't creating an *extra* array. The result of `ToCharArray()` is his working array. – Ben Voigt Apr 02 '14 at 01:15
  • @BenVoigt Ah, good point. Incidentally your answer regarding the constructor taking the offset and length arguments is the more salient point, and I've upvoted your answer :) – Joshua Honig Apr 02 '14 at 01:17
  • @Joshua Honig: thanks a lot for your explanation and nice points on string manipulation. – user1561245 Apr 02 '14 at 02:26
1

You can use LINQ for that:

var output = new string(input.Where(x => !char.IsWhiteSpace(x)).ToArray());

Your mistake is you are removing the spaces but your source array still contains the remaining chars.Using that logic you will never get the correct result because you are not removing anything, you are just replacing the chars.After your while loop you can try this:

return new String(source.Take(newIndex+1).ToArray());

Using Take method get the subset of your source array and ignore the rest.

Here is another alternative way of doing this:

var output = string.Concat(input.Split());
Selman Genç
  • 100,147
  • 13
  • 119
  • 184
  • 1
    This is homework. If OP is learning code, I dont think a linq request will help him. – aloisdg Apr 02 '14 at 00:50
  • @aloisdg: of course it's homework. However we have absolutely no idea what the question is. It might very well be: come up with a way to remove spaces from a string. If so, then this answer and mine provide two alternatives that both use the power of the framework... without using regex, trim or replace – NotMe Apr 02 '14 at 00:52
  • 2
    @aloisdg: I see your point, but it's best not to forget that answers on SO aren't solely for the benefit of the OP. An answer can add value for possible future readers too. – Baldrick Apr 02 '14 at 01:11
  • @Baldrick [Well](https://www.google.com/search?q=c%23+remove+whitespace) and we could mark it as [duplicate](http://stackoverflow.com/questions/6219454/efficient-way-to-remove-all-whitespace-from-string). – aloisdg Apr 02 '14 at 01:16
1

You should note that much depends on how you define "whitespace". Unicode and the CLR define whitespace as being a rather exhaustive list of characters: char.IsWhitespace() return true for quite a few characters.

The "classic" definition of whitespace are the following characters: HT, LF, VT, FF, CR and SP (and some might include BS as well).

Myself, I'd probably do something like this:

public static class StringHelpers
{
  public static string StripWhitespace( this string s )
  {
    StringBuilder sb = new StringBuilder() ;
    foreach ( char c in s )
    {
      switch ( c )
      {
    //case '\b' : continue ; // U+0008, BS uncomment if you want this
      case '\t' : continue ; // U+0009, HT
      case '\n' : continue ; // U+000A, LF
      case '\v' : continue ; // U+000B, VT
      case '\f' : continue ; // U+000C, FF
      case '\r' : continue ; // U+000D, CR
      case ' '  : continue ; // U+0020, SP
      }
      sb.Append(c) ;
    }
    string stripped = sb.ToString() ;
    return stripped ;
  }
}

You could use your approach thusly. However, it's important to READ THE DOCUMENTATION): you'll note the use of a string constructor overload that lets you specify a range within an array as the initialization vector for the string:

public static string StripWhitespace( string s )
{
  char[] buf = s.ToCharArray() ;
  int j = 0 ; // target pointer
  for ( int i = 0 ; i < buf.Length ; ++i )
  {
    char c = buf[i] ;
    if ( !IsWs(c) )
    {
      buf[j++] = c ;
    }
  }
  string stripped = new string(buf,0,j) ;
  return stripped ;
}

private static bool IsWs( char c )
{
  bool ws = false ;
  switch ( c )
  {
//case '\b' : // U+0008, BS uncomment if you want BS as whitespace
  case '\t' : // U+0009, HT
  case '\n' : // U+000A, LF
  case '\v' : // U+000B, VT
  case '\f' : // U+000C, FF
  case '\r' : // U+000D, CR
  case ' '  : // U+0020, SP
    ws = true ;
    break ;
  }
  return ws ;
}

You could also use Linq, something like:

    public static string StripWhitespace( this string s )
    {
        return new string( s.Where( c => !char.IsWhiteSpace(c) ).ToArray() ) ;
    }

Though, I'm willing to be that the Linq approach will be significantly slower than the other two. It's elegant, though.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135