4

I have a string in the format: abc def ghi xyz

I would like to end with it in format: abcdefghi xyz

What is the best way to do this? In this particular case, I could just strip off the last three characters, remove spaces, and then add them back at the end, but this won't work for cases in which the multiple spaces are in the middle of the string.

In Short, I want to remove all single whitespaces, and then replace all multiple whitespaces with a single. Each of those steps is easy enough by itself, but combining them seems a bit less straightforward.

I'm willing to use regular expressions, but I would prefer not to.

Habib
  • 219,104
  • 29
  • 407
  • 436
aaron
  • 164
  • 1
  • 9
  • Using RegEx would be your best bet here. Looks like this has already been answered here: http://stackoverflow.com/questions/1279859/how-to-replace-multiple-white-spaces-with-one-white-space?rq=1 – Michael B Feb 15 '16 at 15:19
  • Multiple Whitespaces don't' seem to be displaying properly. The first string example has several spaces between "ghi" and "xyz", rather than the actual text "(multiple spaces)". – aaron Feb 15 '16 at 15:20
  • @Michael B: That question isn't quite the same thing. – aaron Feb 15 '16 at 15:22

5 Answers5

4

This approach uses regular expressions but hopefully in a way that's still fairly readable. First, split your input string on multiple spaces

var pattern = @"  +"; // match two or more spaces
var groups = Regex.Split(input, pattern);

Next, remove the (individual) spaces from each token:

var tokens = groups.Select(group => group.Replace(" ", String.Empty));

Finally, join your tokens with single spaces

var result = String.Join(' ', tokens.ToArray());

This example uses a literal space character rather than 'whitespace' (which includes tabs, linefeeds, etc.) - substitute \s for ' ' if you need to split on multiple whitespace characters rather than actual spaces.

Dylan Beattie
  • 53,688
  • 35
  • 128
  • 197
2

Well, Regular Expressions would probably be the fastest here, but you could implement some algorithm that uses a lookahead for single spaces and then replaces multiple spaces in a loop:

// Replace all single whitespaces
for (int i = 0; i < sourceString.Length; i++)
{
    if (sourceString[i] = ' ')
    {
        if (i < sourceString.Length - 1 && sourceString[i+1] != ' ')
          sourceString = sourceString.Delete(i);
    }
}

// Replace multiple whitespaces
while (sourceString.Contains("  ")) // Two spaces here!
  sourceString = sourceString.Replace("  ", " ");

But hey, that code is pretty ugly and slow compared to a proper regular expression...

Thorsten Dittmar
  • 55,956
  • 8
  • 91
  • 139
2

For a Non-REGEX option you can use:

string str = "abc def ghi         xyz";
var result = str.Split(); //This will remove single spaces from the result
StringBuilder sb = new StringBuilder();
bool ifMultipleSpacesFound = false;
for (int i = 0; i < result.Length;i++)
{
    if (!String.IsNullOrWhiteSpace(result[i]))
    {
        sb.Append(result[i]);
        ifMultipleSpacesFound = false;
    }
    else
    {
        if (!ifMultipleSpacesFound)
        {
            ifMultipleSpacesFound = true;
            sb.Append(" ");
        }
    }
}

string output = sb.ToString();

The output would be:

output = "abcdefghi xyz"
Habib
  • 219,104
  • 29
  • 407
  • 436
  • Default string.split() would actually remove all of the spaces, so I don't believe this works. – aaron Feb 15 '16 at 19:20
  • @aaron, it will only remove a single space, not multiple spaces. I have tested this code before putting it in the answer – Habib Feb 15 '16 at 19:30
  • Huh. You're right. I had tried something similar, but had a logic error or two that kept me from getting the right result. – aaron Feb 15 '16 at 19:46
1

Here's an approach which uses some fairly subtle logic:

public static string RemoveUnwantedSpaces(string text)
{
    var sb = new StringBuilder();
    char lhs = '\0';
    char mid = '\0';

    foreach (char rhs in text)
    {
        if (rhs != ' ' || (mid == ' ' && lhs != ' '))
            sb.Append(rhs);

        lhs = mid;
        mid = rhs;
    }

    return sb.ToString().Trim();
}

How it works:

We will examine each possible three-character subsequence linearly across the string (in a kind of three-character sliding window). These three characters will be represented, in order, by the variables lhs, mid and rhs.

For each rhs character in the string:

  • If it's not a space we should output it.
  • If it is a space, and the previous character was also space but the one before that isn't, then this is the second in a sequence of at least two spaces, and therefore we should output one space.
  • Otherwise, don't output a space because this is either the first or the third (or later) space in a sequence of two or more spaces and in either case we don't want to output a space: If this happens to be the first in a sequence of two or more spaces, a space will be output when the second space comes along. If this is the third or later, we've already output a space for it.

The subtlety here is that I've avoided special casing the beginning of the sequence by initialising the lhs and mid variables with non-space characters. It doesn't matter what those values are, as long as they are not spaces, but I made them \0 to indicate that they are special values.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • There is a part of me that really appreciates this solution, but another that doesn't want it anywhere near my code repository. – aaron Feb 15 '16 at 19:04
  • @aaron It's not really that complicated. I've renamed the loop variables to make it clearer what's going on. – Matthew Watson Feb 15 '16 at 20:51
1

After second thought here is one line regex solution:

Regex.Replace("abc def ghi    xyz", "( )( )*([^ ])", "$2$3")

the result of this is "abcdefghi xyz"

ORIGINAL ANSWER:

Two lines of code regex solution:

var tmp = Regex.Replace("abc def ghi    xyz", "( )([^ ])", "$2")

tmp is "abcdefghi xyz" then:

var result = Regex.Replace(tmp, "( )+", " ");

result is "abcdefghi xyz"


Explanation:

The first line of code removes single whitespaces and removes one whitespace for multiple whitespaces (so there are 3 spaces in tmp between letters i and x).

The second line just replace multiple whitespaces with one.

In-depth explanation of first line:

We match input string to regex that matches one space and non-space next to it. We also put this two characters in separate groups (we use ( ) for anonymous grouping). So for "abc def ghi xyz" string we have this matches and groups:

match: " d" group1: " " group2: "d"

match: " g" group1: " " group2: "g"

match: " x" group1: " " group2: "x"

We are using substitution syntax for Regex.Replace method to replace match with the content of second group (which is non-whitespace character)

Mariusz Pawelski
  • 25,983
  • 11
  • 67
  • 80