1

The problem is that i'm trying to make a regex that finds the letter differences in two strings for example given "test" and "test" it will say 0 and "test" and "testr" it will say 1

so far I have for example ([test]) which works nice and simple except if there are multiple instances of the same character for example if you had "[tes]" and "test" it wouldn't pick up anything due to the repeating t

in this case the answer needs to be a regex solution as im trying to use it in mysql...don't ask.

Thanks in advance

After seeing a few comments, I am trying to find the differences in two strings with a solution that doesn't use any loops.

Not sure if its even possible but I'd like to try

Ben Jones
  • 699
  • 4
  • 18
  • 1
    Simple string functions won't work? Ie `$string1 != $string2` ? Also, please provide the prrgramming language used and `[tes]` will work for `t`, `e`, `s` and not for `tes` (unless you specifiy some quantifier, that is). – Jan Nov 08 '16 at 15:38
  • 2
    MySQL has weak regex support, but even if had such support your ask would be difficult. I feel this is something which would better be solved in your application layer. – Tim Biegeleisen Nov 08 '16 at 15:38
  • Possible duplicate of [difference between two strings mysql](http://stackoverflow.com/questions/21931763/difference-between-two-strings-mysql) – Oday Mansour Nov 08 '16 at 15:41
  • im not checking to see if there equal, i am checking to see if there are any differences and what those differences are @Jan – Ben Jones Nov 08 '16 at 15:41
  • @BenJones what you are trying to do is your own implementation of a leveshtein distance calculation. Check this post that already creates a Leveshtein function http://stackoverflow.com/questions/13909885/how-to-add-levenshtein-function-in-mysql that is already done. As far as I know mysql doesn't have leveshtein nor fuzzy select – Federico Piazza Nov 08 '16 at 15:42
  • i agree @TimBiegeleisen but in this case I don't have that luxury haha – Ben Jones Nov 08 '16 at 15:42

1 Answers1

1

I have put a comment but I'm posting this answer to give more details.

What you are trying to do is to calculate a Leveshtein distance but mysql doesn't have this, so you have to create your own function. If you check at this link: http://www.artfulsoftware.com/infotree/qrytip.php?id=552

You will find the code to create this function. Quoting the page you will find this:

CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) ) 
  RETURNS INT 
  DETERMINISTIC 
  BEGIN 
    DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; 
    DECLARE s1_char CHAR; 
    -- max strlen=255 
    DECLARE cv0, cv1 VARBINARY(256); 
    SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; 
    IF s1 = s2 THEN 
      RETURN 0; 
    ELSEIF s1_len = 0 THEN 
      RETURN s2_len; 
    ELSEIF s2_len = 0 THEN 
      RETURN s1_len; 
    ELSE 
      WHILE j <= s2_len DO 
        SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; 
      END WHILE; 
      WHILE i <= s1_len DO 
        SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; 
        WHILE j <= s2_len DO 
          SET c = c + 1; 
          IF s1_char = SUBSTRING(s2, j, 1) THEN  
            SET cost = 0; ELSE SET cost = 1; 
          END IF; 
          SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; 
          IF c > c_temp THEN SET c = c_temp; END IF; 
            SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; 
            IF c > c_temp THEN  
              SET c = c_temp;  
            END IF; 
            SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; 
        END WHILE; 
        SET cv1 = cv0, i = i + 1; 
      END WHILE; 
    END IF; 
    RETURN c; 
  END; 

Helper function:

CREATE FUNCTION levenshtein_ratio( s1 VARCHAR(255), s2 VARCHAR(255) ) 
  RETURNS INT 
  DETERMINISTIC 
  BEGIN 
    DECLARE s1_len, s2_len, max_len INT; 
    SET s1_len = LENGTH(s1), s2_len = LENGTH(s2); 
    IF s1_len > s2_len THEN  
      SET max_len = s1_len;  
    ELSE  
      SET max_len = s2_len;  
    END IF; 
    RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100); 
  END; 

Edit: since you edited your question I'll add more details. First of all, your regex is not "correct" since you are using a character class in this way [test], since character classes don't have order nor repetitions, your regex can be written as [est] or [ets] or [tseetetett] (or whatever combination) it is exactly the same.

Additionally, as I said above, you want to calculate a Leveshtein distance but according to your last edit, you want to calculate it without using loops, so I think the right question would be "how to calculate leveshtein distance in mysql without loops". I don't know if this is easily doable, anyway I'll keep this answer in case it is useful

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123