130

Quick add on requirement in our project. A field in our DB to hold a phone number is set to only allow 10 characters. So, if I get passed "(913)-444-5555" or anything else, is there a quick way to run a string through some kind of special replace function that I can pass it a set of characters to allow?

Regex?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Matt Dawdy
  • 19,247
  • 18
  • 66
  • 91

9 Answers9

262

Definitely regex:

string CleanPhone(string phone)
{
    Regex digitsOnly = new Regex(@"[^\d]");   
    return digitsOnly.Replace(phone, "");
}

or within a class to avoid re-creating the regex all the time:

private static Regex digitsOnly = new Regex(@"[^\d]");   

public static string CleanPhone(string phone)
{
    return digitsOnly.Replace(phone, "");
}

Depending on your real-world inputs, you may want some additional logic there to do things like strip out leading 1's (for long distance) or anything trailing an x or X (for extensions).

Chris Cudmore
  • 29,793
  • 12
  • 57
  • 94
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • That's perfect. This is only used a couple of times, so we don't need to create a class, and as far as the leading 1, not a bad idea. But I think I'd rather handle that on a case by case basis, at least in this project. Thanks again -- if I could upvote again, I would. – Matt Dawdy Nov 04 '08 at 17:01
  • 1
    I'm waiting for someone to post an extension method version of this for the string class :) – Joel Coehoorn Nov 04 '08 at 17:33
  • @Joel I added the extension method version below. Guess the comments don't support markdown. – Aaron Oct 21 '11 at 18:07
  • 13
    Note `[^\d]` can be simplified to `\D` – p.s.w.g Jul 15 '14 at 18:24
  • Combined this answer (caching the regex in the class) with the extension method one below :) – Vincent Vancalbergh Feb 09 '15 at 09:23
  • I would suggest using string.empty instead of "" it makes it look a lot cleaner in my opinion – Ronan Aug 03 '16 at 10:03
  • This is a fine answer but be aware if this is a hot path in your code and depending on the length of the string regex performance can be poor and quite impactful on CPU. – MrRoboto Jan 16 '20 at 23:10
79

You can do it easily with regex:

string subject = "(913)-444-5555";
string result = Regex.Replace(subject, "[^0-9]", ""); // result = "9134445555"
Christian C. Salvadó
  • 807,428
  • 183
  • 922
  • 838
  • 2
    Upvoted for being a great answer, but Joel beat you out. Thanks for the answer though -- I really like to see confirmation from multiple sources. – Matt Dawdy Nov 04 '08 at 17:04
  • @JoSmo To be fair, Joel's can be converted to a one-liner pretty trivially. (But I also upvoted :D) – Mage Xy Mar 17 '16 at 18:27
45

You don't need to use Regex.

phone = new String(phone.Where(c => char.IsDigit(c)).ToArray())
Usman Zafar
  • 1,919
  • 1
  • 15
  • 11
  • 3
    Nice Answer, why add more reference to RegularExpressions namespace – Biniam Eyakem Mar 17 '14 at 12:46
  • 1
    @BTE because it's a short-hand that's simply utilizing `system.linq;` – Eric Milliot-Martinez Dec 21 '15 at 19:07
  • 1
    How well does this perform compared with the Regex solution? – Shavais Dec 16 '16 at 01:58
  • 3
    Adding a test to @Max-PC's benchmark code for the LINQ solution results in -- StringBuilder: 273ms, Regex: 2096ms, LINQ: 658ms. Slower than StringBuilder but still significantly faster than Regex. Given that that is benchmarking 1,000,000 replacements, the effective difference between the StringBuilder and LINQ solutions for most scenarios is probably neglible. – Chris Pratt May 15 '18 at 23:45
  • @ChrisPratt for the regex, did you create a new regex each time, or re-use an existing one? That could have a big impact on performance. – carlin.scott Apr 27 '20 at 17:19
  • @EricMilliot-Martinez - Huh? System.Text.RegularExpressions existed in .Net Framework 1.1. System.Linq was added in Framework 3.5. RegEx is its own syntax and implementation. It is based on the concept of "regular expressions" developed mathematically in 1950s, and popularized in 1970s in Unix text processors. LINQ is something different - it is not a regular expression evaluator. – ToolmakerSteve May 27 '20 at 00:12
  • Huh? @ToolmakerSteve, .Net Framework 1.1? Also, I never said it was a regular expression evaluator. What I said still holds true, it's a short-hand that's simply utilizing `system.linq`. @Usman Zafar's answer is good because it offers the OP the option to avoid using Reg Ex to resolve their original problem. – Eric Milliot-Martinez May 28 '20 at 20:12
23

Here's the extension method way of doing it.

public static class Extensions
{
    public static string ToDigitsOnly(this string input)
    {
        Regex digitsOnly = new Regex(@"[^\d]");
        return digitsOnly.Replace(input, "");
    }
}
Aaron
  • 670
  • 5
  • 17
10

Using the Regex methods in .NET you should be able to match any non-numeric digit using \D, like so:

phoneNumber  = Regex.Replace(phoneNumber, "\\D", String.Empty);
Wes Mason
  • 1,611
  • 12
  • 13
  • 5
    This isn't quite right. You need a @ or "\\D" to escape the \ in the regex. Also, you should use String.Empty instead of "" – Bryan Aug 20 '12 at 19:34
5

How about an extension method that doesn't use regex.

If you do stick to one of the Regex options at least use RegexOptions.Compiled in the static variable.

public static string ToDigitsOnly(this string input)
{
    return new String(input.Where(char.IsDigit).ToArray());
}

This builds on Usman Zafar's answer converted to a method group.

Michael Lang
  • 1,100
  • 11
  • 17
4

for the best performance and lower memory consumption , try this:

using System;
using System.Diagnostics;
using System.Text;
using System.Text.RegularExpressions;

public class Program
{
    private static Regex digitsOnly = new Regex(@"[^\d]");

    public static void Main()
    {
        Console.WriteLine("Init...");

        string phone = "001-12-34-56-78-90";

        var sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < 1000000; i++)
        {
            DigitsOnly(phone);
        }
        sw.Stop();
        Console.WriteLine("Time: " + sw.ElapsedMilliseconds);

        var sw2 = new Stopwatch();
        sw2.Start();
        for (int i = 0; i < 1000000; i++)
        {
            DigitsOnlyRegex(phone);
        }
        sw2.Stop();
        Console.WriteLine("Time: " + sw2.ElapsedMilliseconds);

        Console.ReadLine();
    }

    public static string DigitsOnly(string phone, string replace = null)
    {
        if (replace == null) replace = "";
        if (phone == null) return null;
        var result = new StringBuilder(phone.Length);
        foreach (char c in phone)
            if (c >= '0' && c <= '9')
                result.Append(c);
            else
            {
                result.Append(replace);
            }
        return result.ToString();
    }

    public static string DigitsOnlyRegex(string phone)
    {
        return digitsOnly.Replace(phone, "");
    }
}

The result in my computer is:
Init...
Time: 307
Time: 2178

Max-PC
  • 41
  • 4
  • +1 for showing benchmarks. Interesting that the loop with StringBuilder outperforms RegEx, although I guess it makes sense when RegEx probably has to wade through a lot of rules to decide what to do. – Steve In CO Jul 13 '17 at 14:13
3

I'm sure there's a more efficient way to do it, but I would probably do this:

string getTenDigitNumber(string input)
{    
    StringBuilder sb = new StringBuilder();
    for(int i - 0; i < input.Length; i++)
    {
        int junk;
        if(int.TryParse(input[i], ref junk))
            sb.Append(input[i]);
    }
    return sb.ToString();
}
Jon Norton
  • 2,969
  • 21
  • 20
  • That was my first instinct, and was also why I asked here. RegEx seems like a much better solution to me. But thanks for the answer! – Matt Dawdy Nov 04 '08 at 17:03
-1

try this

public static string cleanPhone(string inVal)
        {
            char[] newPhon = new char[inVal.Length];
            int i = 0;
            foreach (char c in inVal)
                if (c.CompareTo('0') > 0 && c.CompareTo('9') < 0)
                    newPhon[i++] = c;
            return newPhon.ToString();
        }
Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
  • `return newPhone.ToString();` will return "System.Char[]". I think you meant `return new string(newPhone);`, But this also is filtering out the numbers 0 and 9 because of the `>` and `<` instead of `>=` and `<=`. But even then then string will have trailing spaces because the `newPhon` array is longer than it needs to be. – juharr Sep 02 '15 at 18:25