I need to hide emails and phone number in a string. Replacing well formatted emails/number is easy with a regex, but what about other formats? Here is an example:
Input:
Email addresses like
email@example.com
or
email AT example DOT com
should be replaced. Phone numbers like
347 323 4567
or
tree four seven, three two three four five six seven
should also be replace.
Output:
Email addresses like
(email hidden)
or
(email hidden)
should be replaced. Phone numbers like
(phone hidden)
or
(phone hidden)
should also be replace.
AirBnB's messaging system is really good at doing that. Apparently they used to do that:
It looks for @ symbols, spellings of “this is me AT whatever DOT com” and series of numbers with at least 7 digits (telephone number) with some sensitivity to separators.
What would be the best way to do the same thing? Writing complex regexes? Using a natural language processing library?