1

My main goal is to search in a column for a specific value (say word). If it doesn't exist, want to find the first that matches word% or wor% or wo% or w%.

In "English", the query would read like: "look for 'word' and return it if it exists. If not, look for the first word that has the maximum same prefix as 'word'".

I can write

SELECT word FROM words WHERE word = 'word' or word LIKE 'word%' or ... LIMIT 1;

I was trying to order by alphabetically, but it won't work (wo comes before wor). Also, can't order in reverse order, or 'wordy' will come before 'word'.

My current idea is just to call the database n times, where n = length(word). But I would like to know if there is any kind of 'short-circuit OR' in SQL -- MySQL/MariaDB, to be precise.

Example

DB has 'w', 'word', 'wording', want to search by 'word' and retrieve 'word' only.

DB has 'z', 'zab', 'zac', 'ze', 'zeb' want to search by 'za' and get 'zab'

Alberto
  • 499
  • 4
  • 23

3 Answers3

1

It sounds like you are looking for a string distance algorithm. The string distance algorithm tells you how many changes are needed to change the current word into the desired word. The idea is that you give all your words a string distance and sort ascending on the distance. An exact match will have 0, a missing or extra letter will have 1.

Not exactly the answer to your question, but I am hoping it is actually what you were looking for. You may also be interested in word stems which will go nicely with this.

EDIT

Extending my answer with a solution to your actual query. Add a funtion:

CREATE FUNCTION `WORDRANK`(`a` VARCHAR(150), `b` VARCHAR(150)) RETURNS INT
BEGIN
    DECLARE rank INT DEFAULT 0;

    WHILE rank < LENGTH(a) DO
        IF rank = 0 AND b = a THEN RETURN rank;
        ELSEIF rank = 0 AND b LIKE CONCAT(a, "%") THEN RETURN rank + 1;
        ELSEIF b LIKE CONCAT(LEFT(a, LENGTH(a) - rank), "%") THEN RETURN rank + 2;
        END IF;
        SET rank = rank + 1;
    END WHILE;

    RETURN rank + 100;
END

Then create a stored procedure:

CREATE PROCEDURE `getClosestMatch`(IN `q` VARCHAR(150))
BEGIN
    SELECT
        word
    FROM words
    WHERE word LIKE CONCAT(LEFT(q, 1),"%")
    ORDER BY WORDRANK(q, word), word
    LIMIT 1;
END

In order to get the desired results, we need to rank each word based on your desired algorithm, which we defined in the WORDRANK function. The stored procedure is so we have a generic way of executing the query.

Community
  • 1
  • 1
OwChallie
  • 14,933
  • 1
  • 10
  • 12
  • Yes, that would be an option, but was trying something simpler :-) – Alberto Apr 16 '17 at 20:11
  • Complex, but seems to work. Just added 'word' to the ORDER BY, in order to have it sorted alphabetically when there are two words with same rank. – Alberto Apr 17 '17 at 08:56
1

Assumption: Words are ASCII, minimum length = 1, maximum value < 'ZZ'. Assumption: VARCHAR input, no trail spaces. Assumption: if 'word' is not there, but 'wording' and 'wordy' are there, you want 'wording' not 'wordy'. Perhaps not simpler, but with a single SELECT statement ...

set @x = 'w'; /* or whatever word you want to search with */
 select * from words
 where word <= concat(@x,'zz')
 and (word like concat(@x,'%') or @x like concat(word,'%'))
 order by length(word) <> length(@x),
 case when length(word) = length(@x) then 1 else 0 end asc,
 case when length(word) > length(@x) then word else 'zz' end asc,
 word desc limit 1;
Peter Gulutzan
  • 465
  • 2
  • 8
0

Try the following query:

SELECT word 
  FROM words
 WHERE word LIKE 'w%'
ORDER BY word;
Hadi
  • 36,233
  • 13
  • 65
  • 124
Teja
  • 13,214
  • 36
  • 93
  • 155