28

I'm trying to fetch multiple email addresses seperated by "," within string from database table, but it's also returning me whitespaces, and I want to remove the whitespace quickly.

The following code does remove whitespace, but it also becomes slow whenever I try to fetch large number email addresses in a string like to 30000, and then try to remove whitespace between them. It takes more than four to five minutes to remove those spaces.

 Regex Spaces =
        new Regex(@"\s+", RegexOptions.Compiled);
txtEmailID.Text = MultipleSpaces.Replace(emailaddress),"");

Could anyone please tell me how can I remove the whitespace within a second even for large number of email address?

Sam Holder
  • 32,535
  • 13
  • 101
  • 181
Joe
  • 399
  • 1
  • 3
  • 6
  • Whitespace != spaces (the latter is broader and includes e.g. line breaks). –  Mar 05 '11 at 11:49
  • http://stackoverflow.com/q/1120198/102112 – Oleks Mar 05 '11 at 12:37
  • I'm having a doubt... Are you removing spaces from the whole string (i.e. the one containing comma-separated emails), or from any single address one by one ? – digEmAll Mar 05 '11 at 13:18
  • http://stackoverflow.com/questions/3501721/how-to-remove-leading-and-trailing-spaces-from-a-string – manoj Oct 26 '16 at 08:52
  • [enter link description here](http://stackoverflow.com/questions/3501721/how-to-remove-leading-and-trailing-spaces-from-a-string) You can find best answer here. Visit to see the solution. – manoj Oct 26 '16 at 08:54

13 Answers13

48

I would build a custom extension method using StringBuilder, like:

public static string ExceptChars(this string str, IEnumerable<char> toExclude)
{
    StringBuilder sb = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        char c = str[i];
        if (!toExclude.Contains(c))
            sb.Append(c);
    }
    return sb.ToString();
}

Usage:

var str = s.ExceptChars(new[] { ' ', '\t', '\n', '\r' });

or to be even faster:

var str = s.ExceptChars(new HashSet<char>(new[] { ' ', '\t', '\n', '\r' }));

With the hashset version, a string of 11 millions of chars takes less than 700 ms (and I'm in debug mode)

EDIT :

Previous code is generic and allows to exclude any char, but if you want to remove just blanks in the fastest possible way you can use:

public static string ExceptBlanks(this string str)
{
    StringBuilder sb = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        char c = str[i];
        switch (c)
        {
            case '\r':
            case '\n':
            case '\t':
            case ' ':
                continue;
            default:
                sb.Append(c);
                break;
        }
    }
    return sb.ToString();
}

EDIT 2 :

as correctly pointed out in the comments, the correct way to remove all the blanks is using char.IsWhiteSpace method :

public static string ExceptBlanks(this string str)
{
    StringBuilder sb = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        char c = str[i];
        if(!char.IsWhiteSpace(c))
            sb.Append(c);
    }
    return sb.ToString();
}
digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • 2
    you can create lightspeed hash for this solution: `byte[] hash = new byte[255];` if you want to exclude `\t` you do `b[(int)'\t'] = 1` and then check the same way. but it will work only for ascii :) – Andrey Mar 05 '11 at 12:19
  • Yes, that would be really fast. Otherwise, if you just want to remove blanks from a string, you can directly use a switch in the function and skip IEnumerable.Contains :) – digEmAll Mar 05 '11 at 12:29
  • Hmmm...nice solution, better than mine :) – Evgeny Gavrin Mar 05 '11 at 12:48
  • 3
    Even better would be to use `StringBuilder sb = new StringBuilder(str.Length);` – Chris Ward Jan 03 '13 at 08:23
  • Char.IsWhiteSpace(c) works better then the switch statement, and initializing stringbuilder class with the length of the input string – katbyte May 31 '13 at 04:21
  • @Katbyte: well `char.IsWhiteSpace(c)` works better in the sense that you don't have to specify the blanks character manually, but it's slower than the switch (I've tested it). – digEmAll May 31 '13 at 08:22
  • @digEmAll: It is faster because `char.IsWhiteSpace(c)` checks for more white space characters then the 4 you test in the switch statement, [msdn lists them](http://msdn.microsoft.com/en-us/library/t809ektx.aspx). So while a switch tuned to a specific subset of white space may be better for a high performance task, for a general purpose function `char.IsWhiteSpace(c)` is probably the better choice as it will match a larger set of white space. – katbyte Jun 03 '13 at 22:27
  • 1
    Another thing which can be considered is that if given string dose not contain any of those white space chars, then we are not required to append each char to a `StringBuilder`! So we can use a **flag** to test if the **first white space char** is found, then start to adding to a StringBuilder. otherwise we can just return the input string itself. This can improve performance specially when given strings usually does not contain search strings. – S.Serpooshan Dec 22 '13 at 11:49
  • @saeedserpooshan: That's true, but I expect one uses this method when at least the 80% of the strings contain blanks to remove; therefore, the speed gain you would get on the 20% of the remaining cases using the specific code, won't reduce the total execution time so much... – digEmAll Dec 22 '13 at 12:07
  • You could add a small test at the end to keep the same string if a new one is not required: `if (sb.Length == str.Length) return str;` I think I've read somewhere it may be better for the GC to keep the initial string. – Simon Mourier May 29 '14 at 17:42
  • @SimonMourier: also the previous comment suggested something like that. Well, yes, probably it would be slightly better... but these are micro optimisations and we should profile and have clear evidences of memory issues before doing them. Furthermore, this is a sample code, I prefer to keep it as short and readable as possible... – digEmAll May 30 '14 at 06:59
15

Given the implementation of string.Replaceis written in C++ and part of the CLR runtime I'm willing to bet

email.Replace(" ","").Replace("\t","").Replace("\n","").Replace("\r","");

will be the fastest implementation. If you need every type of whitespace, you can supply the hex value the of unicode equivalent.

Chris S
  • 64,770
  • 52
  • 221
  • 239
  • 2
    Yes, that's really fast, but this creates 4 strings instead of 1. This slows down a bit in case of long strings, custom implementation using StringBuilder is faster than this. – digEmAll Mar 05 '11 at 12:50
  • @digEmAll It's an email address though, so not really memory intensive. I'd agree if it was a large 1k text file – Chris S Mar 05 '11 at 12:59
  • 2
    As far as I understood it, it's a single string with a lot of emails comma separated... but to be honest, I'm not sure... – digEmAll Mar 05 '11 at 13:14
5

With linq you can do it simply:

emailaddress = new String(emailaddress
                                     .Where(x=>x!=' ' && x!='\r' && x!='\n')
                                     .ToArray());

I didn't compare it with stringbuilder approaches, but is much more faster than string based approaches. Because it does not create many copy of strings (string is immutable and using it directly causes to dramatically memory and speed problems), so it's not going to use very big memory and not going to slow down the speed (except one extra pass through the string at first).

Saeed Amiri
  • 22,252
  • 5
  • 45
  • 83
  • 2
    I really doubt that it is **fast** – Andrey Mar 05 '11 at 12:17
  • @Andrey: It should have linear running time, and construct the array only once. Common Regex problems involve non-linear running time, and common string replacement problems involve repeatedly copying the string. Why wouldn't this solution be fast, compared to a Regex w/ string replace? The only thing I can think of would be function call overhead. Without profiling both, it's speculation. – Merlyn Morgan-Graham Mar 05 '11 at 12:24
  • 1
    @digEmAll, yes, I'd fix it:) funny mistake. – Saeed Amiri Mar 05 '11 at 12:32
  • @Andrey, I don't know it's exactly faster or not, but because it uses yeild return, it doesn't creates string too many times, and one another thing, I think it's better use StringBuilder not string but may be it needs too many change for OP I'd suggest use this, I personally in most cases for large strings prefer to use StringBuilder not string. – Saeed Amiri Mar 05 '11 at 12:35
  • 1
    @Andrey: yes, it's really fast. The only little problem is that it needs to pass through a throw-away array. – digEmAll Mar 05 '11 at 12:39
  • @digEmAll StringBuilder has a capacity constructor, likely creating an internal array of sufficient size. Not much different than the linq example. – B2K Jun 18 '14 at 14:28
  • 1
    As mentioned elsewhere, x => !Char.IsWhiteSpace(x), is preferred. This linq command is my chosen solution for a similar problem. Thanks! – B2K Jun 18 '14 at 14:37
4
emailaddress.Replace("  ", string.Empty);
Harsh Baid
  • 7,199
  • 5
  • 48
  • 92
4

You should try String.Trim(). It will trim all spaces from start to end of a string

Or you can try this method from linked topic: [link]

    public static unsafe string StripTabsAndNewlines(string s)
    {
        int len = s.Length;
        char* newChars = stackalloc char[len];
        char* currentChar = newChars;

        for (int i = 0; i < len; ++i)
        {
            char c = s[i];
            switch (c)
            {
                case '\r':
                case '\n':
                case '\t':
                    continue;
                default:
                    *currentChar++ = c;
                    break;
            }
        }
        return new string(newChars, 0, (int)(currentChar - newChars));
    }
Community
  • 1
  • 1
Evgeny Gavrin
  • 7,627
  • 1
  • 22
  • 27
  • 10
    Well, you should have very serious reasons for introducing unsafe code in safe code. Cleaning string is definitely not the one. – Andrey Mar 05 '11 at 12:20
  • I think that 4-5 minutes to perform a simple action - is unacceptable. It can be much faster. – Evgeny Gavrin Mar 05 '11 at 12:45
  • Theres no need to use pointers, also Char.IsWhiteSpace(c) instead of a switch is a better solution. – katbyte May 31 '13 at 04:22
2

You should consider replacing spaces on the record-set within your stored procedure or query using the REPLACE( ) function if possible & even better fix your DB records since a space in an email address is invalid anyways.

As mentioned by others you would need to profile the different approaches. If you are using Regex you should minimally make it a class-level static variable:

public static Regex MultipleSpaces = new Regex(@"\s+", RegexOptions.Compiled);

emailAddress.Where(x=>{ return x != ' ';}).ToString( ) is likely to have function overhead although it could be optimized to inline by Microsoft -- again profiling will give you the answer.

The most efficient method would be to allocate a buffer and copy character by character to a new buffer and skip the spaces as you do that. C# does support pointers so you could use unsafe code, allocate a raw buffer and use pointer arithmetic to copy just like in C and that is as fast as this can possibly be done. The REPLACE( ) in SQL will handle it like that for you.

Roman Marusyk
  • 23,328
  • 24
  • 73
  • 116
Matthew Erwin
  • 111
  • 1
  • 1
2

Please use the TrimEnd() method of the String class. You can find a great example here.

Roman Marusyk
  • 23,328
  • 24
  • 73
  • 116
Dun
  • 449
  • 3
  • 14
2

There are many diffrent ways, some faster then others:

public static string StripTabsAndNewlines(this string str) {

    //string builder (fast)
    StringBuilder sb = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++) {
        if ( !  Char.IsWhiteSpace(s[i])) {
            sb.Append();
        }
    }
    return sb.tostring();

    //linq (faster ?)
    return new string(str.ToCharArray().Where(c => !Char.IsWhiteSpace(c)).ToArray());

    //regex (slow)
    return Regex.Replace(str, @"\s+", "")

}
katbyte
  • 2,665
  • 2
  • 28
  • 19
1
string str = "Hi!! this is a bunch of text with spaces";

MessageBox.Show(new String(str.Where(c => c != ' ').ToArray()));
Roman Marusyk
  • 23,328
  • 24
  • 73
  • 116
Senagi
  • 11
  • 1
1

I haven't done performance testing on this, but it's simpler than most of the other answers.

var s1 = "\tstring \r with \t\t  \nwhitespace\r\n";
var s2 = string.Join("", s1.Split());

The result is

stringwithwhitespace
qxn
  • 17,162
  • 3
  • 49
  • 72
0
string input =Yourinputstring;
string[] strings = input.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
foreach (string value in strings)
{
   string newv= value.Trim();
   if (newv.Length > 0)
   newline += value + "\r\n";
}
Taryn
  • 242,637
  • 56
  • 362
  • 405
0
string s = " Your Text ";

string new = s.Replace(" ", string.empty);

// Output:
// "YourText"
Butzke
  • 561
  • 1
  • 14
  • 30
0

Fastest and general way to do this (line terminators, tabs will be processed as well). Regex powerful facilities don't really needed to solve this problem, but Regex can decrease performance.

new string
    (stringToRemoveWhiteSpaces
       .Where
       (
         c => !char.IsWhiteSpace(c)
       )
       .ToArray<char>()
    )