0

I'm new to Stackoverflow and regex, so please bear with me. I have an individual posting false phone numbers on our forum. The forum has a content filter that uses regular expressions, and a colleague updated the filter before going on paternity leave. Now I need to update the expression to account for new patterns. If it matches the phone number is replaced with [Redacted]. Is it possible to write an expression that matches the numbers below, but excludes our support line? (The support line number can stick to a single format)

  1. 1(864) yuppie 361-8969
  2. 1(868) yuppie 751 1556
  3. 1(878) 761-1655
  4. 1(864) 391-8999
  5. 1(865) 446 4830

Support Line 1-866-9789

So I really have two questions, 1) Can I use a negative look ahead to match all phone numbers and formats except for our support line? 2) Can I match phone numbers when text is entered in-between the numbers?

The tricky part is I can't ban all numbers or combinations because the forum is used to discuss finances. So numbers are used in posts on a regular basis.

Thanks in advance!

  • What are your rules around text in the middle of a phone number? If we are too loose with our regex you might end up accidentally censoring large pieces of text. Also, what about newlines? – John Nov 07 '18 at 23:05
  • Are you only matching those 5 numbers exactly? – Tim Nov 07 '18 at 23:05
  • Possible duplicate of [A comprehensive regex for phone number validation](https://stackoverflow.com/questions/123559/a-comprehensive-regex-for-phone-number-validation) – Poul Bak Nov 07 '18 at 23:07
  • Any sort of black list will get circumvented. It's easy to disguise phone numbers as other data, e.g. `call this: 1234567`. – Cameron Nov 07 '18 at 23:08
  • Right now we don't have any rules to identify text in the middle of a phone number. I didn't know where to begin for those ones @john – Eggomcnugget Nov 07 '18 at 23:11
  • @Tim If I can match these for now it would be great, but these phone numbers actually lead to scam support services. So I'm expecting this to be an ongoing issue we will have to monitor and update the filter over time. – Eggomcnugget Nov 07 '18 at 23:13

1 Answers1

3

I'm against strict formats for phone numbers (see e.g. this), so please don't use this to validate phone numbers.

I'm also certain that black-listing certain phone number formats is an unending arms-race which is impossible to win (short of banning all numbers, and even then there's ways to circumvent it).

Having said that, try a regex along the lines of this:

\b((\d[-\s]*)?\(?\d{2,3}\)?[-\s]*(\S+[-\s]*)?)?\d{3}[-\s]+\d{4}\b

To whitelist your support line, simply check that the matched string is not "1-855-700-6000".

Cameron
  • 96,106
  • 25
  • 196
  • 225
  • I'm against formal validations of email / website-adresses as well. Not only, cause there will be "false negatives", but also because it's NEVER a validation. It only ensures a "look-valid" format. but `www.aaaaaaaaaaa.com` looks valid as well... – dognose Nov 07 '18 at 23:33
  • Thanks Cameron! I'll give this a try. It won't be used for validating phone numbers, but rather finding and replacing the numbers with another value. I know it's going to be a never ending battle, but I have to try and slow them down as much as possible. I appreciate the help! – Eggomcnugget Nov 07 '18 at 23:43