how to detect telephone numbers in a text (and replace them)?

Question

I know it can be done for bad words (checking an array of preset words) but how to detect telephone numbers in a long text? I'm building a website in PHP for a client who needs to avoid people using the description field to put their mobile phone numbers..(see craigslist etc..)

beside he's going to need some moderation but i was wondering if there is a way to block at least the obvious like nnn-nnn-nnnn, not asking to block other weird way of writing like HeiGHT*/four*/nine etc...

whichever options you choose it might be best to keep an untouched version of the profile, but flag the profile. Then you can go and check if the profile has any phone numbers. If not you can remove the edits. — Galen, Sep 21 '10 at 22:27

Tim Fountain · Answer 1 · 2017-12-06T13:57:00.307

6

Welcome to the world of regular expressions. You're basically going to want to use preg_replace to look for (some pattern) and replace with a string.

Here's something to start you off:

$text = preg_replace('/\+?[0-9][0-9()\-\s+]{4,20}[0-9]/', '[blocked]', $text);

this looks for:

a plus symbol (optional), followed by a number, followed by between 4-20 numbers, brackets, dashes or spaces, followed by a number

and replaces with the string [blocked].

This catches all the obvious combinations I can think of:

012345 123123
+44 1234 123123
+44(0)123 123123
0123456789
Placename 123456 (although this one will leave 'Placename')

however it will also strip out any succession of 6+ numbers, which might not be desirable!

edited Dec 06 '17 at 13:57

answered Sep 21 '10 at 22:14

Tim Fountain

33,093
5
41
69

gettiing error: preg_replace(): Compilation failed: invalid range in character class at offset 16 – always-a-learner Dec 06 '17 at 13:28
1

There was a typo in my regex pattern which I have corrected - try now. – Tim Fountain Dec 06 '17 at 13:57

score 0 · Answer 2 · answered Sep 21 '10 at 22:19

To do so you must use regular expressions as you may know.

I found this pattern that could be useful for your project:

<?php
  preg_match("/(^(([\+]\d{1,3})?[ \.-]?[\(]?\d{3}[\)]?)?[ \.-]?\d{3}[ \.-]?\d{4}$)/", $yourText, $matches);
  //matches variable will contain the array of matched strings
?>

More information about this pattern can be found here http://gskinner.com/RegExr/?2rirv where you can even test it online. It's a great tool to test regular expressions.

score 0 · Answer 3 · answered Sep 21 '10 at 22:20

preg_match($pattern, $subject) will return 1 (true) if pattern is found in subject, and 0 (false) otherwise.

A pattern to match the example you give might be '/\d{3}-\d{3}\d{4}/'

However whatever you choose for your pattern will suffer from both false positives and false negatives.

You might also consider looking for words like mob, cell or tel next to the number.

The fill details of the php pattern matching can be found at http://www.php.net/manual/en/reference.pcre.pattern.syntax.php

Ian

p.s. It can't be done for bad words, as the people in Scunthorpe will tell you.

Michele Carino · Answer 4 · 2017-07-09T06:21:19.770

I think that use a too tight regular espression would lead to loose a great number of detections.

You should check for portions of 10 consecutive chatacters containing more than 5 digits.

So it is similar you will have an analisys routine queued to be called after any message insertion due to the computational weight.

After the 6 or more digits have been isolated replace them as you prefer, including other syblings digits. Better in any case to preserve original data, so you can try and train your detection algorithm until it works the best way.

Then you can also study your user data to create more complex euristics, such like case insensitive numbers written as letters, mixed, dot separated, etc...

It's not about write the most perfect regex, is about approaching the problem statistically and dinamically.

And remember, after you take action, user will change their insertion habits as consequence, so stats will change and you will need to learn and update your euristics.

how to detect telephone numbers in a text (and replace them)?

4 Answers4

Linked