2

I'm building a word unscrambler using MySQL, Think about it like the SCRABBLE game, there is a string which is the letter tiles and the query should return all words that can be constructed from these letters, I was able to achieve that using this query:

SELECT * FROM words
WHERE word REGEXP '^[hello]{2,}$'
AND NOT word REGEXP 'h(.*?h){1}|e(.*?e){1}|l(.*?l){2}|l(.*?l){2}|o(.*?o){1}'

The first part of the query makes sure that the output words are constructed from the letter tiles, the second part takes care of the words occurrences, so the above query will return words like: hello, hell, hole, etc..

My issue is when there is a blank tile (a wildcard), so for example if the string was: "he?lo", the "?" Can be replaced with any letter, so for example it will output: helio, helot.

Can someone suggest any modification on the query that will make it support the wildcards and also takes care of the occurrence. (The blank tiles could be up to 2)

macropod
  • 12,757
  • 2
  • 9
  • 21
  • I don't think this is an appropriate use of regexp. Are you sure the second part really handles duplicate tiles? – Barmar Jun 22 '22 at 23:35
  • Yes, check it out: [Demo](https://www.db-fiddle.com/f/iV84ahE9HsNhjMYUCusjK4/2). – Rakan AbdelQader Jun 22 '22 at 23:38
  • I had to change the fiddle to MySQL 8.0. – Barmar Jun 22 '22 at 23:39
  • Yes, my bad, I should've left a note for that – Rakan AbdelQader Jun 22 '22 at 23:40
  • I'm guessing this would be on an app where there's a user input, right? – FanoFN Jun 22 '22 at 23:41
  • Yes! The user will input the tiles – Rakan AbdelQader Jun 22 '22 at 23:42
  • The first `REGEXP` word input should be straight forward since it's taking directly what user type in but how do you plan to separate the word input into your second `REGEXP` format? I mean, the query has to be dynamically taking the word input, right? – FanoFN Jun 22 '22 at 23:45
  • It will be a for loop that split each tile and append to the query string:tiles[i](.*tiles[i]?){count of occurrence} – Rakan AbdelQader Jun 22 '22 at 23:49
  • I tried adding `(.)(.*?\\1){1}` as another alternative to the `NOT REGEXP`, but that didn't work. – Barmar Jun 22 '22 at 23:52
  • Why do you need `{2,}` in the first regexp? If you want to ensure that words are at least 2 characters, can't you just leave 1-letter words out of the table? – Barmar Jun 22 '22 at 23:56
  • I see.. about the wildcard, are you saying that a user can enter a word with a wildcard like `hel?o`, then from there you want to fill the wildcard with a set of or any of or all 26 alphabet from `a-z`? – FanoFN Jun 22 '22 at 23:58
  • @FanoFN it should be one letter of 26 alphabet letters, if the user input 2 wildcards like ?ell? Then we need 2 letters from the alphabet but the maximum is 2. – Rakan AbdelQader Jun 23 '22 at 00:01
  • My gut says that you will need two separate queries -- one to handle the wild-letter case. – Rick James Jun 23 '22 at 15:43

2 Answers2

2

I've got something that comes close. With a single blank tile, use:

SELECT * FROM words
WHERE word REGEXP '^[acre]*.[acre]*$'
AND  word not  REGEXP 'a(.*?a){1}|r(.*?r){1}|c(.*?c){1}|e(.*?e){1}'

with 2 blank tiles use:

SELECT * FROM words
WHERE word REGEXP '^[acre]*.[acre]*.[acre]*$'
AND word NOT REGEXP 'a(.*?a){1}|r(.*?r){1}|c(.*?c){1}|e(.*?e){1}'

The . in the first regexp allows a character that isn't one of the tiles with a letter on it.

The only problem with this is that the second regexp prevents duplicates of the lettered tiles, but a blank should be allowed to duplicate one of the letters. I'm not sure how to fix this. You could add 1 to the counts in {}, but then it would allow you to duplicate multiple letters even though you only have one blank tile.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    Thank you, that's really helpful, I've tried it and it's working, but as you said it prevent duplicates so a word like "rare" will not be selected. If we increase the counts then we will miss up the duplication rule. That's really challenging – Rakan AbdelQader Jun 23 '22 at 00:13
  • https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/ – Barmar Jun 23 '22 at 00:42
  • So I was thinking what if we removed the duplication rule when we have a wildcard, and it seems that it's working: [Demo] (https://www.db-fiddle.com/f/iV84ahE9HsNhjMYUCusjK4/3) what is your thoughts? we also can narrow the results by saying that the count of the output words can't be more than the count of tiles including the wildcard tiles – Rakan AbdelQader Jun 23 '22 at 09:56
  • What I think you can do is use SQL to narrow down the words like this, then use other code to check whether a word requires more tiles than you have. – Barmar Jun 23 '22 at 15:38
1

A possible starting point:

Sort the letters in the words; sort the letters in the tiles (eg, "ehllo", "acer", "aerr").

That will avoid some of the ORing, but still has other complexities.

If this is really Scrabble, what about the need to attach to an existing letter or letters? And do you primarily want to find a way to use all 7 letters?

Rick James
  • 135,179
  • 13
  • 127
  • 222