2

I am having a string in C# and I would like to filter out (throw away) all characters except for digits i.e. 0 to 9. For example if I have a string like "5435%$% r3443_+_+**╥╡←", then the output should be 54353443. How can this be done using regular expression or something else in C#?

Thanks

Varun Sharma
  • 2,591
  • 8
  • 45
  • 63

6 Answers6

6

Here is some example without regular expressions:

var str = "5435%$% r3443_+_+**╥╡←";
var result = new string(str.Where(o => char.IsDigit(o)).ToArray());
//Or you can make code above slightly more compact, using following syntax:
var result = new string(str.Where(char.IsDigit).ToArray());

Selects from string everything, that is digit-character, and creates new string based on selection.

And speaking about speed.

var sw = new Stopwatch();
var str = "5435%$% r3443_+_+**╥╡←";
sw.Start();
for (int i = 0; i < 100000; i++)
{       
    var result = new string(str.Where(o => char.IsDigit(o)).ToArray());
}
sw.Stop();

Console.WriteLine(sw.ElapsedMilliseconds); // Takes nearly 107 ms 

sw.Reset();
sw.Start();
for (int i = 0; i < 100000; i++)
{
    var s = Regex.Replace(str, @"\D", "");
}
sw.Stop();

Console.WriteLine(sw.ElapsedMilliseconds); //Takes up to 600 ms


sw.Reset();
sw.Start();
for (int i = 0; i < 100000; i++)
{
    var newstr = String.Join("", str.Where(c => Char.IsDigit(c)));
}
sw.Stop();

Console.WriteLine(sw.ElapsedMilliseconds); //Takes up to 109 ms

So regular expression implementation works predictably slow. Join and new string gives pretty similar results, also it might very depending from use case. Did not test implementation with manual string looping, I believe, it might give best results.

Update. Also there is RegexOptions.Compiled option for regular expression, usage from example was intended. But for clarity of test, can say, that compiled regular expression gives in example above nearly 150 ms performance boost, which is still pretty slow (4 times slower then other).

Dmytro
  • 1,590
  • 14
  • 14
6

You don't need regex for this

 var newstr = String.Join("", str.Where(c => Char.IsDigit(c)));
I4V
  • 34,891
  • 6
  • 67
  • 79
2

CODE:

using System;
using System.Linq;
using System.Text.RegularExpressions;
using System.Diagnostics;

public class Foo
{
    public static void Main()
    {
        string s = string.Empty;
        TimeSpan e;
        var sw = new Stopwatch();

        //REGEX        
        sw.Start();
        for(var i = 0; i < 10000; i++)
        {
            s = "123213!¤%//)54!!#¤!#%13425";
            s = Regex.Replace(s, @"\D", "");
        }
        sw.Stop();
        e = sw.Elapsed;

        Console.WriteLine(s);
        Console.WriteLine(e);

        sw.Reset();

        //NONE REGEX        
        sw.Start();
        for(var i = 0; i < 10000; i++)
        {
            s = "123213!¤%//)54!!#¤!#%13425";
            s = new string(s.Where(c => char.IsDigit(c)).ToArray());
        }
        sw.Stop();
        e = sw.Elapsed;

        Console.WriteLine(s);
        Console.WriteLine(e);
    }
}

OUTPUT:

1232135413425
00:00:00.0564964
1232135413425
00:00:00.0107598

Conclusion: This clearly favors the none regex method to solve this issue.

furier
  • 1,934
  • 1
  • 21
  • 39
2

What have you tried?

static Regex rxNonDigits = new Regex( @"[^\d]+");
public static string StripNonDigits( string s )
{
  return rxNonDigits.Replace(s,"") ;
}

Or the probably more efficient

public static string StripNonDigits( string s )
{
  StringBuilder sb = new StrigBuilder(s.Length) ;
  foreach ( char c in s )
  {
    if ( !char.IsDigit(c) ) continue ;
    sb.Append(c) ;
  }
  return sb.ToString() ;
}

Or the equivalent one-liner:

public static string StripNonDigits( string s )
{
  return new StringBuilder(s.Length)
         .Append( s.Where(char.IsDigit).ToArray() )
         .ToString()
         ;
}

Or if you don't care about other culture's digits and only care about ASCII decimal digits, you could save a [perhaps] expensive lookup and do two compares:

public static string StripNonDigits( string s )
{
  return new StringBuilder(s.Length)
         .Append( s.Where( c => c >= '0' && c <= '9' ).ToArray() )
         .ToString()
         ;
}

It should be noted that the LINQ solutions almost certainly require constructing an intermediate array (something that's not required using a StringBuilder. You could also use LINQ aggregation:

s.Where( char.IsDigit ).Aggregate(new StringBuilder(s.Length), (sb,c) => sb.Append(c) ).ToString()

There More Than One Way To Do It!

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
1

You could simply do the following, The caret (^) inside of a character class [ ] is the negation operator.

var pattern = @"[^0-9]+";
var replaced = Regex.Replace("5435%$% r3443_+_+**╥╡←", pattern, "");

Output:

54353443
hwnd
  • 69,796
  • 4
  • 95
  • 132
1

The ^ excludes an expression from your match. Use it with \d, which matches digits 0-9, and replace this with nothing.

var cleanString = Regex.Replace("123abc,.é", "^\d", "");
igelineau
  • 763
  • 9
  • 14