0

I have a function that is meant to remove items from a Collection if a certain field does not pass a validation check (either email or phone, but that's not important in this context). Problem is that a regular expression is relatively slow, and I have lists of 1 million+ items.

My function

public HashSet<ListItemModel> RemoveInvalid(HashSet<ListItemModel> listItems)
        {
            string pattern = (this.phoneOrEmail == "email")//phoneOrEmail is set via config file 
                ?
                //RFC 5322 compliant email regex. see http://www.regular-expressions.info/email.html
@"[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?" 
:
                //north-american phone number regex. see http://stackoverflow.com/questions/12101125/regex-to-allow-only-digits-hypens-space-parentheses-and-should-end-with-a-dig
@"(?:\d{3}(?:\d{7}|\-\d{3}\-\d{4}))|(?:\(\d{3}\)(?:\-\d{3}\-)|(?: \d{3} )\d{4})";

            Regex re = new Regex(pattern);
            if (phoneOrEmail == "email")
            {
                return new HashSet<ListItemModel>(listItems.Where(x => re.IsMatch(x.Email,0)));
            }
            else
            {
                return new HashSet<ListItemModel>(listItems.Where(x => re.IsMatch(x.Tel, 0)));
            }
        }

This takes way too long to execute. Is there a faster way of returning a subset that contains only valid emails/phone numbers?

I need to come up with something that is lightning quick. My other operations usually take only a couple of seconds on 700k+ items, but this method is taking forever and I hate that. I will be experimenting with a series of LINQ .Contains(x,y,z) checks, but in the meantime, I'd like some input from people who are smarter than me.

Captain Kenpachi
  • 6,960
  • 7
  • 47
  • 68

0 Answers0