2

I have a MySQL table containing phone numbers that may be formatted in many different ways. For instance:

  • (999) 999-9999
  • 999-999-9999
  • 999.999.9999
  • 999 999 9999
  • +1 999 999 9999
  • 9999999999
  • 019999999999
  • etc.

The field is stored as a string. All I want to do is return the field with anything non-numeric removed. What SQL could be used to do this in MySQL? So the values above would all return either 9999999999, 19999999999, or 019999999999.

Note this is for a query that will not be run regularly, so I don't need to worry about optimization, indexing, or anything else. However, I do need to include this into a fairly complex join with several other tables. At the most basic level, I'm attempting to find duplicate records and want to group by or at least order by phone number.

Tauren
  • 26,795
  • 42
  • 131
  • 167
  • I wonder if it's worth the hassle since I don't think there's anything like this built-in. This isn't something you can do on the front end? – lc. Jun 21 '10 at 05:45
  • @lc: The goal is to just get a quick and dirty report of potential duplicate accounts, so I was just going to export CSV data directly from an SQL command. Wasn't planning to make a front end for it. – Tauren Jun 21 '10 at 05:51

5 Answers5

2

You have to normalize these numbers before inserting them into database.
Make it stored in numbers only.
Otherwise it would be pain in the neck to get it right.

Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
  • Yes, of course if I had control over it. But I'm dealing with legacy data and can't change it in the productions system. – Tauren Jun 21 '10 at 05:50
  • @Tauren: For something like this, you should always store just the numbers and then format it the same whenever it gets displayed. You're relying too heavily on MySQL here, this is a PHP task. I know you *think* you can't do this without MySQL, but if you really *must* do this in MySQL, then the system you're using does not fit your needs. There must be a way that you could do this in PHP. – animuson Jun 21 '10 at 05:56
1

There's a UDF available for regex replacments (referred to in How to do a regular expression replace in MySQL?).

Or you could write your own UDF using a substring and ASCII code check.

Community
  • 1
  • 1
potatopeelings
  • 40,709
  • 7
  • 95
  • 119
  • This is definitely the slickest solution posted that answers my specific question. However, now that I've looked into using the UDF, it really isn't practical for just a quick one time query. It requires mysql source code to be available and the UDF to be compiled. It would be nice if there was an easy package file to install via apt-get, but I haven't found one. Great answer regardless, and something I might use in the future! – Tauren Jun 26 '10 at 09:11
0

You could use the REPLACE command encapsulating the whole string with the different variations in order to remove them.

Neil Knight
  • 47,437
  • 25
  • 129
  • 188
  • @Ardman: I have no idea what variations will have been used, as the data has been entered as free text. I've already attempted the following, but it gets very cumbersome to use and was hoping for a simpler solution: `replace(replace(replace(replace(replace(phone,' ', ''),'.',''),'-',''),'(',''),')','')` – Tauren Jun 21 '10 at 05:48
  • there's a UDF available for regex replacments (referred to in http://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql). Or you could write your own UDF with a substring and ascii code check. – potatopeelings Jun 21 '10 at 07:36
  • @potatopeelings: that looks sweet! thanks. you should add an answer to this question so I can select it. – Tauren Jun 21 '10 at 21:04
0

This is the work for small regexp but unfortunately MySQL does not support regexp replace (unless you want to recompile it from sources). So I'd say do the cleaning on the code side, it'll be much easier.

vava
  • 24,851
  • 11
  • 64
  • 79
  • that's the verdict I had come to as well, but figured I'd ask the SO community to see if there were any non-code solutions I wasn't aware of. As I mention in a comment to @Ardman's answer, using multiple nested `replace(...)` commands isn't a great solution. Too bad there isn't a regex replace. – Tauren Jun 21 '10 at 06:35
0

I had this same problem! You can see how I solved it here.

Here's the functions I had to create and use that worked out great:

CREATE FUNCTION IsNumeric (val varchar(255)) RETURNS tinyint 
 RETURN val REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$';

CREATE FUNCTION NumericOnly (val VARCHAR(255)) 
 RETURNS VARCHAR(255)
BEGIN
 DECLARE idx INT DEFAULT 0;
 IF ISNULL(val) THEN RETURN NULL; END IF;

 IF LENGTH(val) = 0 THEN RETURN ""; END IF;

 SET idx = REPLACE(val,".","");
 SET idx = LENGTH(val);
  WHILE idx > 0 DO
  IF IsNumeric(SUBSTRING(val,idx,1)) = 0 THEN
   SET val = REPLACE(val,SUBSTRING(val,idx,1),"");
   SET idx = LENGTH(val)+1;
  END IF;
  SET idx = idx - 1;
  END WHILE;
  RETURN val;
END;

CREATE FUNCTION FormatPhone (val VARCHAR(255))
 RETURNS VARCHAR(255)
BEGIN
  SET val = RIGHT(CONCAT("0000000000",NumericOnly(val)),10);
  RETURN CONCAT("(",LEFT(val,3),") ",LEFT(RIGHT(val,7),3),"-",RIGHT(val,4));
END;

Use it like this to get your phone number formatted:

SELECT FormatPhone(PhoneNumberColumn) FROM TableName;

Or like this to only get the digits:

SELECT NumericOnly(PhoneNumberColumn) FROM TableName;
VenerableAgents
  • 645
  • 1
  • 8
  • 16