1

I have a database of English words of a certain length. For example table us_6 contains English words with length of 6.

Now, I want to search the table for words that contains only certain letters. For example, I want to search for words that contains letters vleoyl. I did the search using REGEXP. Here's my query:

    SELECT word FROM us_6 WHERE
    word REGEXP 'v' AND
    word REGEXP 'l' AND
    word REGEXP 'e' AND
    word REGEXP 'o' AND
    word REGEXP 'y' AND
    word REGEXP 'l'

The result returns correct words like lovely and volley but it also returns other words. Here's the result of the query:

    lovely
    loveys
    overly
    volley

Can you help me with this? I only want words that contains exactly the letters provide. For example, vleoyl should only return lovely and volley.

Lynnell Neri
  • 473
  • 1
  • 9
  • 21
  • 1
    Possible duplicate of [MySQL - Return matching pattern in REGEXP query](http://stackoverflow.com/questions/5361457/mysql-return-matching-pattern-in-regexp-query) – Norbert Dec 30 '15 at 00:10
  • @NorbertvanNobelen, unfortunately, the question, though similar, but quite different. – Lynnell Neri Dec 30 '15 at 00:13
  • 1
    My goal was to just point out one possible solution for matching. There are many other examples out there. – Norbert Dec 30 '15 at 00:15
  • @NorbertvanNobelen, Thank you. I truly appreciate your effort. Sorry about that. – Lynnell Neri Dec 30 '15 at 00:16
  • I don't think you should attempt this in SQL - but happy to be proven wrong. – Strawberry Dec 30 '15 at 00:22
  • @Strawberry, can you suggest what tool should I use? I chose MySQL because I have over thousands of words in a table and I find it the most efficient tool to handle searching hehe. – Lynnell Neri Dec 30 '15 at 00:23

3 Answers3

4

The problem here is that you're checking for the presence of l twice. That's the same as... "It contains an l. Yep, still contains an l." It's not checking for two of them. Here's an alternative...

SELECT word FROM us_6 WHERE
word REGEXP 'v' AND
word REGEXP 'l.*l' AND
word REGEXP 'e' AND
word REGEXP 'o' AND
word REGEXP 'y'

This should match all words containing a v, two l's, an e, an o, and a y.

So, every other occurrence of the same letter, just append another .*letter to the query. For example, lullaby needs the following query:

    SELECT word FROM us_7 WHERE
    word REGEXP 'l.*l.*l' AND
    word REGEXP 'u' AND
    word REGEXP 'a' AND
    word REGEXP 'b' AND
    word REGEXP 'y'

See how I add 3 *.l because there are 3 occurrences of l in the word lullaby.

The same thing can also be accomplished with LIKE instead of REGEXP. Here's an equivalent query for the original question...

SELECT word FROM us_6 WHERE
word LIKE '%v%' AND
word LIKE '%l%l%' AND
word LIKE '%e%' AND
word LIKE '%o%' AND
word LIKE '%y%'
Patrick Lee
  • 1,990
  • 1
  • 19
  • 24
  • So, if I have 4 v and 5 y, what would my query be like? By the way, I tried the query and it worked. Thumbs up. – Lynnell Neri Dec 30 '15 at 00:29
  • Hey, I tried adding .*v to every occurrence of v and it worked. Wow. Can I edit your answer to append my findings, okay? – Lynnell Neri Dec 30 '15 at 00:34
  • 1
    Glad it worked for you. To match 4 `v`'s, the pattern would be `'v.*v.*v.*v'`. The `.*` means zero or more of any character so it will match any string with at least 4 `v`'s in it regardless of whether or not there are characters in between them. – Patrick Lee Dec 30 '15 at 00:35
  • Sure, go for it. This isn't the most elegant solution, so I'd love to see a better one. Maybe there's an obscure MySQL built-in function I'm not aware of that would help with this. – Patrick Lee Dec 30 '15 at 00:36
  • 1
    Added an equivalent query using `LIKE` instead of `REGEXP`. – Patrick Lee Dec 30 '15 at 00:46
  • is there any possibility of utilizing your solution so that it can accommodate words that are less than the passed letters? For example, I am looking for 4 letters words that can be formed from `vleoly`. Notice that I am looking for 4-letter words out from the passed 6-letter word. – Lynnell Neri Dec 30 '15 at 01:01
  • Hmm, good question. I'd have to think about that and it's probably worthy of its own question. :) – Patrick Lee Dec 30 '15 at 01:10
  • Okay, I'm going to create another question now. Would you standby with it as well? I am very grateful to you and the community :) – Lynnell Neri Dec 30 '15 at 01:11
  • Sure, I'll take a look later. – Patrick Lee Dec 30 '15 at 01:41
3

Think outside the box!

Store lovely and volley as ellovy. That is, sort the letters, then do an exact match.

Build a table that maps

ellovy -> lovely
ellovy -> volley
ellowy -> yellow
elorvy -> overly
elovsy -> loveys 

And have a non-UNIQUE (not PRIMARY) key on the first column. (Probably have the second column be the PRIMARY KEY.

No need for REGEXP, OR, etc. And muuuuuch faster. Furthermore, it could be a single table for all word lengths.

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

I'm not seriously advocating this as a solution - instead I'd use application level code for this task (e.g. a bit of PHP), for which there must be myriad tutorials - but just for fun...

SELECT * FROM strings;
+-----------+--------+
| string_id | string |
+-----------+--------+
|         2 | lovely |
|         3 | volley |
|         1 | yellow |
+-----------+--------+

3 rows in set (0.00 sec)SELECT DISTINCT CONCAT(a.x,b.x,c.x,d.x,e.x,f.x) needle
  FROM 
     ( SELECT 'v' x UNION ALL SELECT 'o' UNION ALL SELECT 'l' UNION ALL SELECT 'l' UNION ALL SELECT 'e' UNION ALL SELECT 'y') a
  JOIN
     ( SELECT 'v' x UNION ALL SELECT 'o' UNION ALL SELECT 'l' UNION ALL SELECT 'l' UNION ALL SELECT 'e' UNION ALL SELECT 'y') b
  JOIN
     ( SELECT 'v' x UNION ALL SELECT 'o' UNION ALL SELECT 'l' UNION ALL SELECT 'l' UNION ALL SELECT 'e' UNION ALL SELECT 'y') c
  JOIN
     ( SELECT 'v' x UNION ALL SELECT 'o' UNION ALL SELECT 'l' UNION ALL SELECT 'l' UNION ALL SELECT 'e' UNION ALL SELECT 'y') d
  JOIN
     ( SELECT 'v' x UNION ALL SELECT 'o' UNION ALL SELECT 'l' UNION ALL SELECT 'l' UNION ALL SELECT 'e' UNION ALL SELECT 'y') e
  JOIN
     ( SELECT 'v' x UNION ALL SELECT 'o' UNION ALL SELECT 'l' UNION ALL SELECT 'l' UNION ALL SELECT 'e' UNION ALL SELECT 'y') f
  JOIN strings s
    ON s.string = CONVERT(CONCAT(a.x,b.x,c.x,d.x,e.x,f.x) USING utf8) ;

+--------+
| needle |
+--------+
| lovely |
| volley |
+--------+
Strawberry
  • 33,750
  • 13
  • 40
  • 57