0

I have a set of characters that are defined as valid characters. Let me define the valid string by the following regexp:

^[a-zA-Z0-9\ .-_]+$

(alphanumeric, space, dot, dash and underscore)

The question is that given a column containing a lot of invalid characters, how I can run an update to convert each invalid character to one space? And then possibly convert consequent spaces to one space?

I cannot run several replace commands because there are a lot of possible invalid characters. So I expect a regexp solution.

Currently, I am doing the task in Java (after exporting the table to tsv format). But I want a MySQL approach.

Alisa
  • 2,892
  • 3
  • 31
  • 44
  • I know the regex for identifying the invalid strings, but I want to convert all the invalid characters in these strings to white space. – Alisa Sep 11 '14 at 03:56
  • http://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql – sgeddes Sep 11 '14 at 03:59
  • See [this](http://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql). In my experience as it is not possible in `MySQL` you can switch to `Postgres` or pull data into another language (Perl is a good one for this), and covert there and re-import and you are always better off. – MattSizzle Sep 11 '14 at 04:00

1 Answers1

1

If your MySQL version supports it, create a function:

DELIMITER $$
CREATE FUNCTION my_func_1 (str TEXT)
  RETURNS TEXT
BEGIN
  DECLARE ret TEXT DEFAULT '';
  DECLARE chr TEXT DEFAULT '';
  DECLARE i INT DEFAULT 1;
  WHILE i < (LENGTH(str) + 1) DO
    SET chr = SUBSTRING(str, i, 1);
    IF chr REGEXP '[-a-zA-Z0-9\\_.]'
      THEN SET ret = CONCAT(ret, chr);
    ELSE
      SET ret = CONCAT(ret, ' ');
    END IF;
    SET i = i + 1;
  END WHILE;
  WHILE ret LIKE '%  %' DO
    SET ret = REPLACE(ret, '  ', ' ');
  END WHILE;
  RETURN TRIM(ret);
END;
$$
DELIMITER ;

Test it a bit:

SELECT my_func_1('$a-B\\?!=01._%'); > a-B\ 01._

and update with SET col = my_func_1(col)


If not needed anymore:

DROP FUNCTION IF EXISTS my_func_1;

Also I changed your regex a bit as - indicates a range, if between characters in a class or is .-_ intended? Then modify the pattern.

Jonny 5
  • 12,171
  • 2
  • 25
  • 42
  • Perfect! I found some other versions on the Internet, but did have problems in converting some of the characters. In this version, we just have to define the regexp inside the function. – Alisa Sep 11 '14 at 05:26
  • @Alisa Great, it works for your task, glad could have helped :) – Jonny 5 Sep 11 '14 at 11:30