0

I have a dictionary table 'dictionary', where column 'word' contains the list of all English words.

I want to find all words that contain specific alphabet characters only and exclude the rest of the alphabet. I am able to do so (see the example below), but as you can see, it is downright ugly.

EXAMPLE

Currently to find all words that contain letters 'a', 'b', 'c', 'x', 'y', 'z', but exlude rest of the alphabet letters I do this:

SELECT word 
FROM   dictionary
WHERE  ( 
          word LIKE '%a%' 
          OR word LIKE '%b%' 
          OR word LIKE '%c%' 
          OR word LIKE '%x%' 
          OR word LIKE '%y%' 
          OR word LIKE '%z%' 
       ) AND ( 
             word NOT LIKE '%d%' 
             AND word NOT LIKE '%e%' 
             AND word NOT LIKE '%f%' 
             AND word NOT LIKE '%g%' 
             AND word NOT LIKE '%h%' 
             AND word NOT LIKE '%i%' 
             AND word NOT LIKE '%j%' 
             AND word NOT LIKE '%k%' 
             AND word NOT LIKE '%l%' 
             AND word NOT LIKE '%m%' 
             AND word NOT LIKE '%n%' 
             AND word NOT LIKE '%o%' 
             AND word NOT LIKE '%p%' 
             AND word NOT LIKE '%q%' 
             AND word NOT LIKE '%r%' 
             AND word NOT LIKE '%s%' 
             AND word NOT LIKE '%t%' 
             AND word NOT LIKE '%u%' 
             AND word NOT LIKE '%v%' 
             AND word NOT LIKE '%w%' ) 

Any way to accomplish this task using some form of regex or other optimization? Any tricks or hints would be much appreciated.

jjj
  • 2,594
  • 7
  • 36
  • 57

6 Answers6

2

This is a better approach.

SELECT word 
FROM dictionary
WHERE word REGEXP '.*[abcxyzABCXYZ].*$' 
AND word NOT REGEXP '.*[defghijklmnopqrstuvwDEFGHIJKLMNOPQRSTUVW].*$';
hondvryer
  • 442
  • 1
  • 3
  • 18
  • This is producing SQL errors. I am not sure why, but it won't execute even though I changed 'city' on the third line to 'word'. – jjj Feb 16 '18 at 09:23
2

You can achieve it using REGEXP

SELECT `word `
FROM `dictionary`
WHERE `word` REGEXP '[abcxyzABCXYZ]'
    AND `word` NOT REGEXP '[defghijklmnopqrstuvwDEFGHIJKLMNOPQRSTUVW]'
Mittal Patel
  • 2,732
  • 14
  • 23
  • I was happy way too early... this doesn't support UTF8 alphabet characters :( – jjj Feb 16 '18 at 10:26
  • https://stackoverflow.com/questions/19774618/mysql-regex-utf-8-characters - MySQL regular expression library didn't support the UTF8, However you can use the equivalent HEX to support it – Mittal Patel Feb 16 '18 at 11:52
0

Check This.

        set @1='a'\\
        set @2 ='b'\\
        set @3 ='c'\\
        set @4 ='x'\\
        set @5 ='y'\\
        set @6 ='z'\\


        set  @find=CONCAT(@1,@2,@3,@4,@5,@6)\\


        set  @ALl='abcxyzdefghijklmnopqrstuvw'\\ all alphabets here

        set  @ALl=replace(@ALl,@1,'')\\  replace alphabet that you want
        set  @ALl=replace(@ALl,@2,'')\\
        set  @ALl=replace(@ALl,@3,'')\\
        set  @ALl=replace(@ALl,@4,'')\\
        set  @ALl=replace(@ALl,@5,'')\\
        set  @ALl=replace(@ALl,@6,'')\\

        SELECT word 
        FROM dictionary
        WHERE word RLIKE CONCAT('.*[',@find,'].*$')
        AND word NOT RLIKE CONCAT('.*[',@ALl,'].*$');\\

Check Demo Here

Mr. Bhosale
  • 3,018
  • 1
  • 17
  • 34
0

If you can't use REGEXP for some reason, then this should work too:

SELECT word 
  FROM dictionary
 WHERE Replace(Replace(Replace(Replace(Replace(Replace(
            word, 'a', ''),
                  'b', ''),
                  'c', ''),
                  'x', ''),
                  'y', ''),
                  'z', '') = ''

I doubt it will be very fast though. Then again, I'm not sure anything will be fast for this requirement =)

deroby
  • 5,902
  • 2
  • 19
  • 33
0

The test is simply:

    word REGEXP '^[abcxyz]*$'

That says that everything from start (^) to end ($) must be a string of zero or more (*) of the characters ([]) abcxyz.

I did not include both lower and upper case on the presumption that you are using a case-insensitive collation.

There is no need for the AND .. NOT.

There is a possible problem: If you want to allow notw, do you want to match won't? That is, what should be done about punctuation?

(There may be other edge cases that are not adequately specified in the Question.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
0
WHERE Field not LIKE '%[^a-z0-9 .]%'
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
Kirk
  • 1
  • 4
    Welcome to Stack Overflow. While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. [How to Answer](https://stackoverflow.com/help/how-to-answer). Kind Regards. – Elletlar Nov 12 '20 at 10:44