3

I am trying to figure out a way of doing an "anagram" function as a stored procedure on MySQL. Lets say I have a database containing all the words in the dictionary - I want to enter a parameter of some letters as a VARCHAR and get back a list of words which make up an anagram of those letters.

I guess what I'm sort of saying is, how do I run an SQL command to say "Select all words which are the same length as the parameter AND contain each of the letters in the parameter".

I have explored the string functions available (http://www.hscripts.com/tutorials/mysql/string-function.php). I'm sure these can be used in conjunction in some way but can't quite get the syntax right when it gets complicated.

I am new to SQL, and it just seems like the String functions available are very limited. Any help would be greatly appreciated :)

ecirp
  • 145
  • 2
  • 4
  • 11
  • i would personally use regular expressions (http://dev.mysql.com/doc/refman/5.0/en/regexp.html) there may be a better way though – exussum Jun 17 '13 at 21:58
  • Oh yeah, I probably should have thought about those – ecirp Jun 17 '13 at 22:02
  • http://stackoverflow.com/questions/1559751/regex-to-make-sure-that-the-string-contains-at-least-one-lower-case-char-upper gives more info on regex for a case similar to this one – exussum Jun 17 '13 at 22:06
  • I think this is what you seek. http://stackoverflow.com/questions/4078838/mysql-array-data-type-split-string – transilvlad Jun 17 '13 at 22:07

2 Answers2

2

You don't; it's not a sensible thing to ask a relational database to do.

However, if someone was forcing me at gunpoint to implement anagram finding using a relational database, I would denormalize it like this:

word | sorted
-----|-------
bar  | abr
bra  | abr
keel | eekl
leek | eekl

Where "sorted" consists of all of the letters in "word", sorted using any rule you like as long as it's a total order. You would use something other than SQL to compute that part.

Then you could find anagrams with something like this:

SELECT w2.word AS anagram
FROM words w1
JOIN words w2 ON w1.sorted=w2.sorted
WHERE w1.word = 'leek'
AND w2.word <> w1.word
hobbs
  • 223,387
  • 19
  • 210
  • 288
  • I think this with the combination of my answer could be a decent solution. Sort all the words in the dictionary, store them in another table (don't store the duplicates!) and reference that id in your words table. Then you can just sort the word you're trying to find an anagram for, get the ID for that and retrieve all the words that match. Like you said, gun to the head implementation – Mataniko Jun 17 '13 at 22:55
0

SQL is probably not the right place to do this, you should do it on the front end.

First of all consider the properties of an anagram, it will be the same length as the words in your dictionary. You can start by retrieving those words.

Instead of creating a variable per letter consider using an array Each letter maps to an index (a=0, b=3, etc...). Each time you run into that letter increase the value for that bucket so for the word "dad" you'll end up with a structure that looks like this:

arr[0]=1, arr[1]=0, arr[2]=0, arr[3]=2, arr[4]=0 and so on...

Now you can just see if your words match each item in the array.

While not impossible in SQL, you can represent that kind of logic in the database, for example another table that will have a reference to the dictionary word and each tuple would be the array, then you can just retrieve all the items with the same values.

Mataniko
  • 2,212
  • 16
  • 18
  • Of course you might want to reference the word from the mapping table instead of the other way around – Mataniko Jun 17 '13 at 22:29
  • I see, but how do I capture each letter? Using regex I guess? – ecirp Jun 17 '13 at 22:32
  • If you're doing this on the front end, iterate it as a char array (most programming languages would let you do that). In mysql you can do a loop that goes from 0 to char_length and gets one character at a time using the substring function. (Did I mention SQL is the worst place to do this) – Mataniko Jun 17 '13 at 22:52