1

I have to compare phone numbers in Linq to entities. I am using Regex so that I can compare only numbers while ignoring format.

As we can not use Regex in where clause so I get list of phone numbers and then applied loop to compare the phone numbers.

Here is the code:

    /// <summary>
    /// Check Phone number exist or Not
    /// </summary>
    /// <param name="accountNumber"></param>
    /// <returns></returns>
    public bool CheckPhoneNumberExist(string phoneNumber)
    {
        LP= new LPEntities();

        // In order to compare phone number, we need to use regex so that we can compare only numbers and not any specific format
        phoneNumber = Regex.Replace(phoneNumber, "[^0-9]", "");

        var phoneList = (from act in LP.Accounts
                         select act).ToList();
        if (phoneList.Count > 0)
        {
            foreach (var item in phoneList)
            {
                if (item.Telephone != null)
                {
                    if (Regex.Replace(item.Telephone, "[^0-9]", "") == phoneNumber)
                    {
                        return true;
                    }
                }
            }
        }

        return false;
      }

The code is working fine but there is a performance issue. We have 20 thousands records in the database which keep on increasing so its slow to make a list and then iterate through that list.

How might we optimize the above? Is there some other solution like if we can do it with stored procedure?

Thanks,

clarkitect
  • 1,720
  • 14
  • 23
Gaurav123
  • 5,059
  • 6
  • 51
  • 81
  • One of the option i see to use the CLR stored procedure http://msdn.microsoft.com/en-us/magazine/cc163473.aspx . it will have the capability of C# + it will be run inside the SQL Server – Devesh May 20 '14 at 05:04

3 Answers3

4

Note that your regex technique won't identify that these numbers are actually the same:

+441234567890
+44 (0)1234 567890
01234 567890

My first choice would be to store the number as a varchar without formatting and only format it for display. That means you get rid of the regex on retrieval and you can add an index to speed it up even more.

My second choice would be to add a column so that you store both and keep them synchronised.

See the answers here:

Is it better to store telephone numbers in some canonical format or "as entered"?

Lastly you might get some improvement if you alter your linq to this:

bool exists = (from act in LP.Accounts
               where act.Telephone != null //move the null check to the database
               select act.Telephone)  //select only the telephone number
              .AsEnumerable()  //using linq-to-objects
                               //after retrieving the results from the database
                               //lets you use Regex 
                               //- but it does get them all into memory
              .Any(pn => Regex.Replace(pn, "[^0-9]", "") == phoneNumber) 
              //Any stops at the first one 
              //better than Counting them all then checking `> 0`

As an aside. Adding the check if (phoneList.Count > 0) is redundant in front of foreach (var item in phoneList) because the body of the loop is not executed if there are no items in the list.

Community
  • 1
  • 1
Colin
  • 22,328
  • 17
  • 103
  • 197
  • yes storing simple number in database is good, but its existing functionality to store the format so i can not change. Linq expression works well, Thanks a lot !! Can you please let me know how "AsEnumerable and Any" works – Gaurav123 May 20 '14 at 11:20
  • Try looking on stackoverflow e.g. http://stackoverflow.com/q/5311034/150342 and http://stackoverflow.com/q/305092/150342 – Colin May 20 '14 at 11:22
2

I have tried one new approach where I am replacing the extra format characters with '' as mentioned below

select account, telephone from account where 
Replace(Replace(Replace(Replace(Replace(telephone, '-',''), '(',''),')',''),' ',''),'+',''
    )='9084336325'

but it will work only when we are clear with different characters to replace.

Gaurav123
  • 5,059
  • 6
  • 51
  • 81
0

First off, you're loading the whole entity-list you're performing two regex replacements for every check you make. Something along the lines of this should be more efficient:

public bool CheckPhoneNumberExist(string newPhoneNumber)
{
    LP = new LPEntities();

    var phoneList = (from act in LP.Accounts
                     select act).ToList();

    foreach (string phoneNumber in phoneList)
        Add(phoneNumber);

    return Add(newPhoneNumber);
}

private HashSet<string> _uniquePhoneNumbers = new HashSet<string>();

public bool Add(string phoneNumber)
{
    // Only use Regex if necessary
    if (!IsNumeric(phoneNumber))
        phoneNumber = RemoveNonNumericCharacters(phoneNumber);

    // Returns false if string already exists in HashSet
    return !_uniquePhoneNumbers.Add(phoneNumber);
}

private static readonly char[] _forbiddenChars = new char[] { ',', '.', '.', ' ', '+', '-' };

private static bool IsNumeric(string s)
{
    foreach(char c in _forbiddenChars)
        if(s.IndexOf(c) >= 0)
            return false;

    return true;
}

private static string RemoveNonNumericCharacters(string s)
{
    return Regex.Replace(s, "[^0-9]", "");
}

Another approach is to make sure that numbers are saved in a standardised format from the start.