2

In MySQL I need to remove all single characters from a string:

"A Quick Brown B C D Fox" -> "Quick Brown Fox"

Any tips ?

Dylan
  • 9,129
  • 20
  • 96
  • 153

2 Answers2

1

If you can process it outside of mysql (php, .net, perl) you can use regular expressions to do this quite easily. Mysql's regular expression engine, unfortunately, only tells you if there is a match, not what was matched.

TrippyD
  • 735
  • 5
  • 9
0

You want [[:<:]] and [[:>:]] to search for word boundries in MySQL. (It would be \b in other regular expressions, PERL, grep, etc.)

So...

'[[:<:]][a-zA-Z0-9][[:>:]]'

Would find any single character.

OK TRY THIS...

This is based upon the function above, but hardcoded to look for single characters. I don't claim this to be much more than a hack. But if you need to get the job done, and you cant install the external libraries for whatever reason, it won't slow you down on strings that do NOT have any single characters to replace:

delimiter //
DROP FUNCTION IF EXISTS `REMOVE_SINGLE_CHARS`//
CREATE FUNCTION `REMOVE_SINGLE_CHARS`(original varchar(1000)) RETURNS varchar(1000) CHARSET utf8 DETERMINISTIC
BEGIN

DECLARE rxChars VARCHAR(20);
DECLARE temp VARCHAR(1000);
DECLARE ch1 CHAR(1);
DECLARE ch2 CHAR(1);
DECLARE ch3 CHAR(1);
DECLARE i INT;

SET rxChars = '[a-zA-Z0-9]';
SET i = 1;
SET temp = "";

IF original REGEXP CONCAT('[[:<:]]',rxChars,'[[:>:]]') THEN
  loop_label: LOOP
    set ch1 = SUBSTRING(original,i,1);
    if CHAR_LENGTH(original) > i THEN 
      set ch2 = SUBSTRING(original,i+1,1);
    END IF;
    if CHAR_LENGTH(original) > i+1 THEN 
      set ch3 = SUBSTRING(original,i+2,1);
    END IF;
    if(i = 1) THEN
        IF (ch1 NOT regexp rxChars OR ch2 regexp rxChars) THEN
          set temp = CONCAT(temp, ch1);
        END IF;
    END IF;

    IF(ch2 = ' ') THEN
      # Theoretically this is redundant, but for some reason the
      # CONCAT(temp, ch2) below is not working when ch2 = ' '   YMMV
      set temp = CONCAT(temp, ' ');
    ELSE 
      IF(ch2 NOT regexp rxChars OR ch1 regexp rxChars OR (i+1<CHAR_LENGTH(original) AND ch3 regexp rxChars)) THEN
        set temp = CONCAT(temp, ch2);
      END IF;
    END IF;

    IF i+2>CHAR_LENGTH(original) THEN
      LEAVE loop_label;
    END IF;
    SET i=i+1;
  END LOOP;
ELSE
  SET temp = original;
END IF;
RETURN temp;

END//
Bill Heller
  • 250
  • 2
  • 6
  • It's not working :( God, I hate regexes and all those stupid differences on different platforms.... Why on earth does MySQL don't understand NORMAL regexes ?! – Dylan Aug 05 '11 at 21:27
  • Well, I'm guessing it's not working because of the way the function iterates over the string. It's splitting the string up, so I'm betting that it is finding word boundaries at the end of the substring that are not actual word boundaries in your original string. So, while the syntax is undeniably a bit odd, I think the functionality is right for regex, it's the function implementation that is killing us. We just need to work on a way to get around the splitting of the string. I'll ponder it for a bit. – Bill Heller Aug 06 '11 at 00:31
  • I can think of possible fixes, but they are going to get ugly and slow especially on long strings. It seems like this library (http://www.mysqludf.org/lib_mysqludf_preg/) from Derek's original link above would be a better solution than trying to solve it with this type of function. Do you have access to the server to install a UDF library? – Bill Heller Aug 06 '11 at 00:43
  • Ok, that function should do the job if nothing else. It does not seem too slow on initial testing, but use it with caution as it is a bit... umm lets just say "less than elegant." – Bill Heller Aug 06 '11 at 01:29