1

I want to code a function in Python returning all possible anagrams of a word given. Only words of the English language are considered as such.

However, up to now, I only managed to generate all permutations of a word.

import itertools

def anagrams(word):
    letters = list(word)
    perms = itertools.permutations(letters)
    return [''.join(p) for p in perms]

How can I efficiently check and filter for valid English words now?

turalson
  • 21
  • 2

3 Answers3

3

First you need an English dictionary in Python. I usually use nltk even though there might be better packages. You can install the dictionary of the package by

import nltk
nltk.download('words')

and then a slight adjustment of your code yields what you want:

from nltk.corpus import words
import itertools

# words.words() is list, for faster runtime
word_set = set(words.words())

def anagrams1(word):
    letters = list(word.lower())
    perms = itertools.permutations(letters)
    word_lst = [''.join(p) for p in perms]
    ana_lst = set(w for w in word_lst if w in word_set)
    return ana_lst

For example,

anagrams1('sink')
>>> {'inks', 'sink', 'skin'}

Edit thanks to Kelly Bundy: A far better runtime can be achieved by a different algorithm, that is checking for every correct word if it is an anagram of the input.

def anagrams2(word):
    word_sorted = sorted(word)
    ana_lst = set(w for w in words.words() if sorted(w)==word_sorted)
    return ana_lst
stroxx
  • 81
  • 6
  • 2
    Much better with the set now. But long words like "permutations" still take a long time. – Kelly Bundy Apr 19 '23 at 14:02
  • @KellyBundy You are right. It even produces a memory error on my computer. Do you see any ways to improve the runtime/used memory space? – turalson Apr 19 '23 at 14:08
  • 2
    Different algorithm. Go through the nltk words and check whether they're anagrams of the input word. – Kelly Bundy Apr 19 '23 at 14:09
  • You *could* improve the first one, in particular by not building a *list* of all permutations. `word_set.intersection(map(''.join, itertools.permutations('permutations')))` runs without issue in ~2 minutes for me. But it's just an inferior approach anyway... – Kelly Bundy Apr 19 '23 at 15:14
0

Your method is pretty inefficient for long words. Better precompute all anagrams:

  • Get an English dictionary and process it as follows:

    • For every word, sort the letters and add to a Python dictionary using the sorted word as the key, and append to the list of words with the same key;
  • For a given query word, sort the letters and lookup the Python dictionary.

E.g. the allowed words are "one", "two", "three", "neo". Store as

"eno": ["one", "neo"]
"otw": ["two"]
"eehrt": ["three"]

Now the anagrams of "eon" -> "eno" are "one", "neo".


Note that the preprocessed dictionary is only about twice as large as the initial set of words. And the preprocessing time will remain reasonable as all words have to be input to the set/dictionary anyway (sorting the letters can be done fairly efficiently by histogram sort).

Yves Daoust
  • 672
  • 9
0

As I said in my comment, I'd just do one pass through the dictionary to see if the word had the same letters:

from collections import Counter
TARGET_WORD = "trace"
target_pattern = Counter(TARGET_WORD.lower())

for word in open("dictionary.txt","rt").read().splitlines():
    if Counter(word.lower()) == target_pattern:
        print(word)

If you were to process multiple words, you can preprocess your dictionary:

dictionary = [Counter(x.lower()) for x in open("dictionary.txt","rt").read().splitlines()]

for word in word_list:  # word_list is your list of target words.
    if Counter(word.lower()) in dictionary:
        print (word)

(Edit: My first preprocessing program tried to make the preprocessed dictionary a set, but tested the code and Counters aren't hashable as @KellyBundy caught right away. A set would probably be more efficient, so you could use tuple(sorted(Counter(x).items()) for the entries and the comparison, but I'll leave that as an exercise for the reader.)

RufusVS
  • 4,008
  • 3
  • 29
  • 40
  • @KellyBundy I just tested that code and caught that. I could make them into sorted tuples, or just not bother `set` ing them... Good Catch! – RufusVS Apr 19 '23 at 15:09
  • In the second piece of code, what is `word_list`? – slothrop Apr 19 '23 at 15:23
  • @slothrop A list of target words, if you wanted to look up multiple anagrams. – RufusVS Apr 19 '23 at 15:32
  • 1
    @RufusVS I think then this needs a step then that reconstructs the actual dictionary word. For example, if `word_list = ['hotnyp']`, then the word's counter should get a hit in the dictionary search, but the printed output will be `hotnyp` not `python`. – slothrop Apr 19 '23 at 15:40
  • But then you don't print the anagrams of the target words, you only print the target words themselves. Maybe even if they *don't* have anagrams, just appear *themselves* in the dictionary. – Kelly Bundy Apr 19 '23 at 15:40
  • LOL! Stupid me. That's what I get for doing a fly-by! Of course you need the anagram word in there. Or should I say the list of anagram words. Each dictionary hit in that case should hold the list of words that it applies to. I may edit that later. It's dinner time... – RufusVS Apr 20 '23 at 22:50