Efficient way to remove ALL whitespace from String?

Question

I'm calling a REST API and am receiving an XML response back. It returns a list of a workspace names, and I'm writing a quick IsExistingWorkspace() method. Since all workspaces consist of contiguous characters with no whitespace, I'm assuming the easiest way to find out if a particular workspace is in the list is to remove all whitespace (including newlines) and doing this (XML is the string received from the web request):

XML.Contains("<name>" + workspaceName + "</name>");

I know it's case-sensitive, and I'm relying on that. I just need a way to remove all whitespace in a string efficiently. I know RegEx and LINQ can do it, but I'm open to other ideas. I am mostly just concerned about speed.

Parsing XML with regex is almost as bad as [parsing HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). — dtb, Jun 02 '11 at 19:47
@henk holterman; See my answer below, regexp doesn't seem to be the fastest in all cases. — Henk J Meulekamp, Jan 29 '13 at 20:05
Regex doesn't seem to be the fastest at all. I have summarized the results from many different ways to remove whitespace from a string. The summary is in an answer below - http://stackoverflow.com/a/37347881/582061 — Stian Standahl, May 23 '16 at 13:22

score 809 · Accepted Answer · edited Jun 22 '21 at 02:52

809

This is fastest way I know of, even though you said you didn't want to use regular expressions:

Regex.Replace(XML, @"\s+", "");

Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.

private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement) 
{
    return sWhitespace.Replace(input, replacement);
}

edited Jun 22 '21 at 02:52

NearHuscarl

66,950
18
261
230

answered Jun 02 '11 at 19:38

slandau

23,528
42
122
184

6

I could use a regular expression, I'm just not sure if it's the fastest way. – Corey Ogburn Jun 02 '11 at 19:39
1

I'm pretty sure it is. At the very least behind the scenes you have to check every character, and this is just doing a linear search. – slandau Jun 02 '11 at 19:39
There isn't a faster way, the only "other" way is to do @"string".Replace(" ", string.Empty) for a million different combinations. Regex will do it all with just that. – Smith3 Jun 02 '11 at 19:41
Side comment: in general Regex is no faster just neater - it decomposes your expression and also linearly does what you asked it too. – markmnl Sep 07 '12 at 01:51
82

If you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think. `private static readonly Regex sWhitespace = new Regex(@"\s+"); public static string ReplaceWhitespace(string input, string replacement) { return sWhitespace.Replace(input, replacement); }` – hypehuman Jan 13 '15 at 15:22
3

use split/join combination as tested to be the fastest so far, see KernowCode answer below. – Jay Byford-Rew Jun 09 '15 at 13:11
`Regex.Replace(XML, @"\p{Zs}", string.Empty)` to clean too whitespaces that are not code 32. – Nauzet Jan 10 '17 at 12:36
21

For those new to RegEx and looking for an explanation as to what this expression means, `\s` means "match any whitespace token", and `+` means "match one or more of the proceeding token". Also [RegExr](https://regexr.com/) is a nice website to practice writing RegEx expressions with, if you want to experiment. – jrh Oct 31 '17 at 17:02
This method could default to removing the whitespace if the signature defaults the replacement value. `public static string ReplaceWhitespace(string input, string replacement = "")` – Galactic Aug 12 '22 at 19:23

score 248 · Answer 2 · edited Nov 23 '22 at 08:29

248

I have an alternative way without regexp, and it seems to perform pretty good. It is a continuation on Brandon Moretz answer:

 public static string RemoveWhitespace(this string input)
 {
    return new string(input.ToCharArray()
        .Where(c => !Char.IsWhiteSpace(c))
        .ToArray());
 }

I tested it in a simple unit test:

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace1(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = input.RemoveWhitespace();
    }
    Assert.AreEqual(expected, s);
}

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace2(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = Regex.Replace(input, @"\s+", "");
    }
    Assert.AreEqual(expected, s);
}

For 1,000,000 attempts the first option (without regexp) runs in less than a second (700 ms on my machine), and the second takes 3.5 seconds.

edited Nov 23 '22 at 08:29

Enrico Campidoglio

56,676
12
126
154

answered Jan 29 '13 at 19:58

Henk J Meulekamp

2,839
1
20
13

54

`.ToCharArray()` is not necessary; you can use `.Where()` directly on a string. – ProgramFOX Jan 01 '14 at 11:26
14

Just to note here. Regex is slower... on small strings! If you say you had a digitized version of a Volume on US Tax Law (~million words?), with a handful of iterations, Regex is king, by far! Its not what is faster, but what should be used in which circumstance. You only proved half the equation here. -1 until you prove the second half of the test so that the answer provides more insight to when what should be used. – Piotr Kula Mar 06 '15 at 16:28
21

@ppumkin He asked for a single pass removal of whitespace. Not multiple iterations of other processing. I'm not going to make this single pass whitespace removal into an extended post about benchmarking text processing. – Henk J Meulekamp Mar 09 '15 at 15:39
1

You said its preferred not to use to regex this time but didn't say why. – Piotr Kula Mar 09 '15 at 16:26
4

@ProgramFOX, in a different question (can't readily find it) I noticed that at least in some queries, using `ToCharArray` is faster than using `.Where()` directly on the string. This has something to do with the overhead into the `IEnumerable<>` in each iteration step, and the `ToCharArray` being very efficient (block-copy) and the compiler optimizes iteration over arrays. Why this difference exists, no-one has been able to explain me, but measure before you remove `ToCharArray()`. – Abel Nov 18 '17 at 23:30
@Abel Oh, that's interesting. Thanks for the comment! – ProgramFOX Nov 19 '17 at 08:34
For other newbies like me: Get rid of "char[] does not contain a definition for 'Where'" : "using System.Linq;" – T4NK3R Jan 16 '20 at 17:37
1

Using `.Where()` without `.ToCharArray()` working **slower** on my machine. Be careful with that – picolino Feb 20 '20 at 12:41

score 124 · Answer 3 · edited Feb 18 '16 at 21:47

124

Try the replace method of the string in C#.

XML.Replace(" ", string.Empty);

edited Feb 18 '16 at 21:47

Rudey

4,717
4
42
84

answered Jun 02 '11 at 19:43

Mike_K

9,010
5
20
27

34

Doesn't remove tabs or newlines. If I do multiple removes now I'm making multiple passes over the string. – Corey Ogburn Jun 02 '11 at 19:45
@MattSach why does it not remove ALL whitespace? – Zapnologica Nov 14 '17 at 11:20
4

@Zapnologica It's only replacing space characters. The OP asked for replacement of newlines as well (which are "whitespace" characters, even though they're not a space character). – Matt Sach Nov 15 '17 at 12:37
Regex.Replace(XML, @"\s+", string.Empty) removes all whitespaces. – Robert Smith May 13 '21 at 12:35

Jay Byford-Rew · Answer 4 · 2015-09-07T08:46:49.487

109

My solution is to use Split and Join and it is surprisingly fast, in fact the fastest of the top answers here.

str = string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));

Timings for 10,000 loop on simple string with whitespace inc new lines and tabs

split/join = 60 milliseconds
linq chararray = 94 milliseconds
regex = 437 milliseconds

Improve this by wrapping it up in method to give it meaning, and also make it an extension method while we are at it ...

public static string RemoveWhitespace(this string str) {
    return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}

edited Sep 07 '15 at 08:46

answered Jun 09 '15 at 13:02

Jay Byford-Rew

5,736
1
35
36

4

I really like this solution, I've been using a similar one since pre-LINQ days. I'm actually impressed with LINQs performance, and somewhat surprised with regex. Maybe the code was not as optimal as it could have been for regex (you'll have to cache the regex object for example). But the crux of the problem is that the "quality" of the data will matter a lot. Maybe with long strings the regex will outperform the other options. It will be a fun benchmark to perform... :-) – Loudenvier Jul 29 '15 at 14:08
2

How does default(string[]) == a list of all whitespace characters? I see it working, but I am not understanding how? – Jake Drew Aug 27 '15 at 22:49
Split needs a valid array and null will not do so default(type) where in this case is a string[] returns the correct default for the function. – Jay Byford-Rew Aug 28 '15 at 08:19
1

Actually `default(string[])` returns `null`. This is just a special mode of this `String.Split(...)` overload that splits by whitespaces if you pass `null` in. – Frank J Nov 05 '15 at 21:29
but there is a different between just 'null' and the default. It will not work with null, hence the use of default. – Jay Byford-Rew Nov 06 '15 at 08:58
7

@kernowcode You mean the ambiguity between the the 2 overloads with `string[]` and `char[]`? you just have to specify which one you want e.g.: `string.Join("", str.Split((string[])null, StringSplitOptions.RemoveEmptyEntries));`. That is actually what your call to `default` does in this case since it returns `null` as well: it helps the compiler to decide which overload to pick. Hence my comment because the statement in your comment "Split needs a valid array and null will not do ..." is false. No big deal, just thought worth mentioning since Jake Drew asked how this worked. +1 for your answer – Frank J Nov 09 '15 at 20:22
8

Cool idea ... but i would do it as follows: `string.Concat("H \ne llo Wor ld".Split())` – michaelkrisper Feb 05 '16 at 14:05
4

michaelkrisper solution is very readable. I did a test and 'split/join' (162 milliseconds) performed better than 'split/concat' (180 milliseconds) for 10,000 iterations of the same string. – Jay Byford-Rew Sep 05 '16 at 07:56
Just a heads up, using only the Split part will only remove whitespace from the first match – StefanJM Apr 03 '19 at 21:53
1

It's _fascinating_ to me that this doesn't work if you use empty string `""` instead of `default(string[])`. Using `default(char[])` works equally well, as does `(char[])null`, since they both lead to the compiler using the versions of the function which accept `string[]`/ `char[]` as their first parameter... even `"".ToCharArray()` works like this!! But alas, not empty string `""` directly, for some reason. Would love to know why the decision was made to make the implementation of the String version of the function different from the rest. – zcoop98 May 26 '23 at 23:20
Oh!! `str.Split(null)` splits on whitespace apparently (learned from [this answer](https://softwareengineering.stackexchange.com/a/372304))! That means you don't even need to use the `StringSplitOptions.RemoveEmptyEntries` option, because the whitespace characters will be removed when splitting on them. – zcoop98 May 26 '23 at 23:25

Stian Standahl · Answer 5 · 2019-05-06T20:56:35.273

Building on Henks answer I have created some test methods with his answer and some added, more optimized, methods. I found the results differ based on the size of the input string. Therefore, I have tested with two result sets. In the fastest method, the linked source has a even faster way. But, since it is characterized as unsafe I have left this out.

Long input string results:

InPlaceCharArray: 2021 ms (Sunsetquest's answer) - (Original source)
String split then join: 4277ms (Kernowcode's answer)
String reader: 6082 ms
LINQ using native char.IsWhitespace: 7357 ms
LINQ: 7746 ms (Henk's answer)
ForLoop: 32320 ms
RegexCompiled: 37157 ms
Regex: 42940 ms

Short input string results:

InPlaceCharArray: 108 ms (Sunsetquest's answer) - (Original source)
String split then join: 294 ms (Kernowcode's answer)
String reader: 327 ms
ForLoop: 343 ms
LINQ using native char.IsWhitespace: 624 ms
LINQ: 645ms (Henk's answer)
RegexCompiled: 1671 ms
Regex: 2599 ms

Code:

public class RemoveWhitespace
{
    public static string RemoveStringReader(string input)
    {
        var s = new StringBuilder(input.Length); // (input.Length);
        using (var reader = new StringReader(input))
        {
            int i = 0;
            char c;
            for (; i < input.Length; i++)
            {
                c = (char)reader.Read();
                if (!char.IsWhiteSpace(c))
                {
                    s.Append(c);
                }
            }
        }

        return s.ToString();
    }

    public static string RemoveLinqNativeCharIsWhitespace(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveLinq(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !Char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveRegex(string input)
    {
        return Regex.Replace(input, @"\s+", "");
    }

    private static Regex compiled = new Regex(@"\s+", RegexOptions.Compiled);
    public static string RemoveRegexCompiled(string input)
    {
        return compiled.Replace(input, "");
    }

    public static string RemoveForLoop(string input)
    {
        for (int i = input.Length - 1; i >= 0; i--)
        {
            if (char.IsWhiteSpace(input[i]))
            {
                input = input.Remove(i, 1);
            }
        }
        return input;
    }

    public static string StringSplitThenJoin(this string str)
    {
        return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
    }

    public static string RemoveInPlaceCharArray(string input)
    {
        var len = input.Length;
        var src = input.ToCharArray();
        int dstIdx = 0;
        for (int i = 0; i < len; i++)
        {
            var ch = src[i];
            switch (ch)
            {
                case '\u0020':
                case '\u00A0':
                case '\u1680':
                case '\u2000':
                case '\u2001':
                case '\u2002':
                case '\u2003':
                case '\u2004':
                case '\u2005':
                case '\u2006':
                case '\u2007':
                case '\u2008':
                case '\u2009':
                case '\u200A':
                case '\u202F':
                case '\u205F':
                case '\u3000':
                case '\u2028':
                case '\u2029':
                case '\u0009':
                case '\u000A':
                case '\u000B':
                case '\u000C':
                case '\u000D':
                case '\u0085':
                    continue;
                default:
                    src[dstIdx++] = ch;
                    break;
            }
        }
        return new string(src, 0, dstIdx);
    }
}

Tests:

[TestFixture]
public class Test
{
    // Short input
    //private const string input = "123 123 \t 1adc \n 222";
    //private const string expected = "1231231adc222";

    // Long input
    private const string input = "123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222";
    private const string expected = "1231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc222";

    private const int iterations = 1000000;

    [Test]
    public void RemoveInPlaceCharArray()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveInPlaceCharArray(input);
        }

        stopwatch.Stop();
        Console.WriteLine("InPlaceCharArray: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveStringReader()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveStringReader(input);
        }

        stopwatch.Stop();
        Console.WriteLine("String reader: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinqNativeCharIsWhitespace()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinqNativeCharIsWhitespace(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ using native char.IsWhitespace: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinq()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinq(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegex()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegex(input);
        }

        stopwatch.Stop();
        Console.WriteLine("Regex: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegexCompiled()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegexCompiled(input);
        }

        stopwatch.Stop();
        Console.WriteLine("RegexCompiled: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveForLoop()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveForLoop(input);
        }

        stopwatch.Stop();
        Console.WriteLine("ForLoop: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [TestMethod]
    public void StringSplitThenJoin()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.StringSplitThenJoin(input);
        }

        stopwatch.Stop();
        Console.WriteLine("StringSplitThenJoin: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }
}

Edit: Tested a nice one liner from Kernowcode.

score 31 · Answer 6 · edited Nov 01 '20 at 14:01

31

Just an alternative because it looks quite nice :) - NOTE: Henks answer is the quickest of these.

input.ToCharArray()
 .Where(c => !Char.IsWhiteSpace(c))
 .Select(c => c.ToString())
 .Aggregate((a, b) => a + b);

Testing 1,000,000 loops on "This is a simple Test"

This method = 1.74 seconds
Regex = 2.58 seconds
new String (Henks) = 0.82 seconds

edited Nov 01 '20 at 14:01

Foxfire

5,675
21
29

answered Nov 28 '13 at 05:28

BlueChippy

5,935
16
81
131

2

Why was this downvoted? It's perfectly acceptable, meets the requirements, works faster than the RegEx option and is very readable? – BlueChippy Mar 10 '15 at 10:11
5

because it can be written a lot shorter: new string(input.Where(c => !Char.IsWhiteSpace(c)).ToArray()); – Bas Smit Mar 17 '15 at 14:06
7

Might be true - but the answer still stands, is readable, faster than regex and produces the desired result. Many of the other answers are AFTER this one...therefore a downvote does not make sense. – BlueChippy Mar 18 '15 at 12:12
2

Is there a unit for "0.82"? Or is it a relative measure (82%)? Can you edit your answer to make it more clear? – Peter Mortensen Jul 01 '17 at 13:29

SunsetQuest · Answer 7 · 2021-05-27T05:33:28.503

26

I found a nice write-up on this on CodeProject by Felipe Machado (with help by Richard Robertson)

He tested ten different methods. This one is the fastest safe version...

public static string TrimAllWithInplaceCharArray(string str) {

    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;

    for (int i = 0; i < len; i++) {
        var ch = src[i];

        switch (ch) {

            case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':

            case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':

            case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':

            case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':

            case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

            default:
                src[dstIdx++] = ch;
                break;
        }
    }
    return new string(src, 0, dstIdx);
}

And the fastest unsafe version... (some inprovements by Sunsetquest 5/26/2021 )

public static unsafe void RemoveAllWhitespace(ref string str)
{
    fixed (char* pfixed = str)
    {
        char* dst = pfixed;
        for (char* p = pfixed; *p != 0; p++)
        {
            switch (*p)
            {
                case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
                case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
                case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
                case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
                case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

                default:
                    *dst++ = *p;
                    break;
            }
        }

        uint* pi = (uint*)pfixed;
        ulong len = ((ulong)dst - (ulong)pfixed) >> 1;
        pi[-1] = (uint)len;
        pfixed[len] = '\0';
    }
}

There are also some nice independent benchmarks on Stack Overflow by Stian Standahl that also show how Felipe's function is about 300% faster than the next fastest function. Also, for the one I modified, I used this trick.

edited May 27 '21 at 05:33

answered May 21 '16 at 21:25

SunsetQuest

8,041
2
47
42

I've tried translating this to C++ but am a little stuck. Any ideas why my port might be failing? http://stackoverflow.com/questions/42135922/why-do-i-get-error-c3851-a-universal-character-name-cannot-designate-a-characte – Jon Cage Feb 09 '17 at 11:48
3

I can't resist. Look in the comments section of the article you refer to. You will find me as "Basketcase Software". He and worked on this together for a while. I had completely forgotten about this when this problem came back up again. Thanks for good memories. :) – Richard Robertson May 15 '17 at 23:12
1

And what if you want to remove extra WS only ? What about this https://stackoverflow.com/questions/17770202/remove-extra-whitespace-from-a-string-in-c/61550714#61550714 mod ? – Jan May 01 '20 at 21:02
Fastest is a bit slower ;-) String as container perfoms better here (in app 4:15 to 3:55 => 8.5% less, but when left string 3:30 => 21.4% less and profiller shows around 50% spent in this method). So in real live string should be around 40% faster comparing to (slow) array conversion used here. – Jan May 05 '20 at 22:37
1

The original string will be changed by the unsafe version! – Motlicek Petr May 23 '21 at 18:52
@Motlicek Petr - nice catch on the original value being changed. I changed it up a bit so it is clear to the programmer that it is changed. Also, I think it is even faster now because no "new string". – SunsetQuest May 27 '21 at 05:32
1

So many years later I find this reference to that article again :-) By the way Loudenvier (me) is Felipe Machado :-) – Loudenvier Dec 30 '22 at 18:25
1

Hi @Loudenvier - I often run into my own code also. Except this is your code =\ (That I feel like you should get StackOverflow credit for by the way) Hopefully, a lot of people upvoted your codeproject from here. BTW - I have posted a lot on Codeproject also. (see Sunsetquest) – SunsetQuest Jan 01 '23 at 21:29
1

@SunsetQuest I was really flattered to see your post here referencing and using "my" code from the Codeproject article . I do post it to SHARE it! In fact seeing the code posted here from someone else is even more satisfying! Thank you! (BTW, sometimes I start reading an answer for a question only to realize I'm reading one of my own answers from years ago. It's so funny) – Loudenvier Jan 02 '23 at 11:35

score 15 · Answer 8 · edited Jul 01 '17 at 13:31

15

If you need superb performance, you should avoid LINQ and regular expressions in this case. I did some performance benchmarking, and it seems that if you want to strip white space from beginning and end of the string, string.Trim() is your ultimate function.

If you need to strip all white spaces from a string, the following method works fastest of all that has been posted here:

    public static string RemoveWhitespace(this string input)
    {
        int j = 0, inputlen = input.Length;
        char[] newarr = new char[inputlen];

        for (int i = 0; i < inputlen; ++i)
        {
            char tmp = input[i];

            if (!char.IsWhiteSpace(tmp))
            {
                newarr[j] = tmp;
                ++j;
            }
        }
        return new String(newarr, 0, j);
    }

edited Jul 01 '17 at 13:31

Peter Mortensen

30,738
21
105
131

answered Dec 31 '13 at 13:08

JHM

381
4
12

I'd be curious to know the details of your benchmarkings--not that I am skeptical, but I'm curious about the overhead involved with Linq. How bad was it? – Mark Meuer Dec 17 '14 at 22:39
I haven't re-run all the tests, but I can remember this much: Everything that involved Linq was a lot slower than anything without it. All the clever usage of string/char functions and constructors made no percentual difference if Linq was used. – JHM Jan 16 '15 at 10:57

score 12 · Answer 9 · edited Jul 01 '17 at 14:15

12

Regex is overkill; just use extension on string (thanks Henk). This is trivial and should have been part of the framework. Anyhow, here's my implementation:

public static partial class Extension
{
    public static string RemoveWhiteSpace(this string self)
    {
        return new string(self.Where(c => !Char.IsWhiteSpace(c)).ToArray());
    }
}

edited Jul 01 '17 at 14:15

Peter Mortensen

30,738
21
105
131

answered Oct 18 '14 at 00:11

Maksood

1,180
14
19

this is basically an unnecessary answer (regex is overkill, but is a quicker solution than given one - and it is already accepted?) – W1ll1amvl Oct 18 '14 at 00:41
How can you use Linq extension methods on a string? Can't figure out which using I am missing others than `System.Linq` – GGirard Feb 18 '16 at 23:02
Ok looks like this is not available in PCL, IEnumerable is conditional in [Microsoft String implementation](http://referencesource.microsoft.com/#mscorlib/system/string.cs,48)... And I am using Profile259 which does not support this :) – GGirard Feb 18 '16 at 23:09
@GGirard strings are collections of char, so linq should work by default. – Sinaesthetic Dec 17 '20 at 17:59

score 7 · Answer 10 · answered Jan 12 '22 at 11:05

7

I think alot of persons come here for removing spaces. :

string s = "my string is nice";
s = s.replace(" ", "");

answered Jan 12 '22 at 11:05

larsemil

880
9
15

The problem with this, is that a space can be written in many different ways as mentioned in other answers. This replace will work for 90%~ of the cases or so. – Niels Lucas Feb 28 '22 at 10:28
2

its s.Replace() – Karthic Srinivasan Aug 12 '22 at 12:48
I don't know how you'd have any way of knowing of how people find this question, but that's not important because all that matters is the question specifically asks how to remove **all whitespace**, which this answer fails to do. – Lance U. Matthews Sep 30 '22 at 21:24

score 4 · Answer 11 · answered Oct 05 '14 at 00:42

I needed to replace white space in a string with spaces, but not duplicate spaces. e.g., I needed to convert something like the following:

"a b   c\r\n d\t\t\t e"

to

"a b c d e"

I used the following method

private static string RemoveWhiteSpace(string value)
{
    if (value == null) { return null; }
    var sb = new StringBuilder();

    var lastCharWs = false;
    foreach (var c in value)
    {
        if (char.IsWhiteSpace(c))
        {
            if (lastCharWs) { continue; }
            sb.Append(' ');
            lastCharWs = true;
        }
        else
        {
            sb.Append(c);
            lastCharWs = false;
        }
    }
    return sb.ToString();
}

score 4 · Answer 12 · edited Jul 01 '17 at 13:25

Here is a simple linear alternative to the RegEx solution. I am not sure which is faster; you'd have to benchmark it.

static string RemoveWhitespace(string input)
{
    StringBuilder output = new StringBuilder(input.Length);

    for (int index = 0; index < input.Length; index++)
    {
        if (!Char.IsWhiteSpace(input, index))
        {
            output.Append(input[index]);
        }
    }
    return output.ToString();
}

score 2 · Answer 13 · answered Jun 20 '16 at 14:59

2

We can use:

    public static string RemoveWhitespace(this string input)
    {
        if (input == null)
            return null;
        return new string(input.ToCharArray()
            .Where(c => !Char.IsWhiteSpace(c))
            .ToArray());
    }

answered Jun 20 '16 at 14:59

Tarik BENARAB

21
2

This is almost exactly the same as Henk's answer above. The only difference is that you check for `null`. – Corey Ogburn Jun 20 '16 at 15:06
1

Yes, check for null is importente – Tarik BENARAB Jun 20 '16 at 15:10
1

Maybe this should have just been a comment on his answer. I am glad you brought it up though. I didn't know extension methods could be called on null objects. – Corey Ogburn Jun 20 '16 at 15:11

score 2 · Answer 14 · answered Jun 02 '11 at 19:40

2

I assume your XML response looks like this:

var xml = @"<names>
                <name>
                    foo
                </name>
                <name>
                    bar
                </name>
            </names>";

The best way to process XML is to use an XML parser, such as LINQ to XML:

var doc = XDocument.Parse(xml);

var containsFoo = doc.Root
                     .Elements("name")
                     .Any(e => ((string)e).Trim() == "foo");

answered Jun 02 '11 at 19:40

dtb

213,145
36
401
431

Once I verify that a particular tag has the proper value, I'm done. Wouldn't parsing the document have some overhead? – Corey Ogburn Jun 02 '11 at 19:42
4

Sure, it has some overhead. But it has the benefit of being correct. A solution based e.g. on regex is much more difficult to get right. If you determine that a LINQ to XML solution is too slow, you can always replace it with something faster. But you should avoid hunting for the most efficient implementation before you know that the correct one is too slow. – dtb Jun 02 '11 at 19:45
This is going to be running in my employer's backend servers. Lightweight is what I'm looking for. I don't want something that "just works" but is optimal. – Corey Ogburn Jun 02 '11 at 19:47
4

LINQ to XML is one of the most lightweight ways to correctly work with XML in .NET – dtb Jun 02 '11 at 19:49

score 2 · Answer 15 · answered Aug 04 '20 at 11:07

Using Linq, you can write a readable method this way :

    public static string RemoveAllWhitespaces(this string source)
    {
        return string.IsNullOrEmpty(source) ? source : new string(source.Where(x => !char.IsWhiteSpace(x)).ToArray());
    }

score 1 · Answer 16 · answered Jun 18 '15 at 19:49

Here is yet another variant:

public static string RemoveAllWhitespace(string aString)
{
  return String.Join(String.Empty, aString.Where(aChar => aChar !Char.IsWhiteSpace(aChar)));
}

As with most of the other solutions, I haven't performed exhaustive benchmark tests, but this works well enough for my purposes.

score 1 · Answer 17 · answered Aug 02 '22 at 23:06

1

The straightforward way to remove all whitespaces from a string, "example" is your initial string.

String.Concat(example.Where(c => !Char.IsWhiteSpace(c))

answered Aug 02 '22 at 23:06

J S

591
6
11

hvanbrug · Answer 18 · 2015-02-04T21:01:02.620

I have found different results to be true. I am trying to replace all whitespace with a single space and the regex was extremely slow.

return( Regex::Replace( text, L"\s+", L" " ) );

What worked the most optimally for me (in C++ cli) was:

String^ ReduceWhitespace( String^ text )
{
  String^ newText;
  bool    inWhitespace = false;
  Int32   posStart = 0;
  Int32   pos      = 0;
  for( pos = 0; pos < text->Length; ++pos )
  {
    wchar_t cc = text[pos];
    if( Char::IsWhiteSpace( cc ) )
    {
      if( !inWhitespace )
      {
        if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );
        inWhitespace = true;
        newText += L' ';
      }
      posStart = pos + 1;
    }
    else
    {
      if( inWhitespace )
      {
        inWhitespace = false;
        posStart = pos;
      }
    }
  }

  if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );

  return( newText );
}

I tried the above routine first by replacing each character separately, but had to switch to doing substrings for the non-space sections. When applying to a 1,200,000 character string:

the above routine gets it done in 25 seconds
the above routine + separate character replacement in 95 seconds
the regex aborted after 15 minutes.

score 0 · Answer 19 · answered May 26 '23 at 23:40

It's arguably not as expressive as using Regex or using Char.IsWhiteSpace, but the following is by far the most concise version of this:

public static string RemoveWhitespace(this string input)
{
   return string.Concat(input.Split(null));
}

This leverages the Split(Char[]) overload of Split(), which accepts null for its only parameter, and interprets that value as "split on all whitespace" (the same outcome as if you used an empty char array or default(char[]) instead).

Internally, it uses Char.IsWhiteSpace to make the determination of whether it should split on a given character:

If the separator argument is null or contains no characters, the method treats white-space characters as the delimiters. White-space characters are defined by the Unicode standard, and the Char.IsWhiteSpace method returns true if a white-space character is passed to it.

Efficient way to remove ALL whitespace from String?

19 Answers19

Linked

Related