0

I have written an anagram solving algorithm, which does not work.

for word in wordlist: #Checking for equal length
    if sorted(word.replace("\n", "")) == sorted(anagram):
        possible.append(word)

I needed to use len(word) - 1 to take away the \n.

Kai
  • 38,985
  • 14
  • 88
  • 103
user1149589
  • 243
  • 3
  • 11
  • Side point, but why are you subtracting 1 from `len(word)`? – voithos Jan 14 '12 at 18:49
  • probably not a good idea to modify the sequence while iterating on it... – Savino Sguera Jan 14 '12 at 18:49
  • I don't like that "if letter in word pass else ...", can't you change it to something simpler? – BlackBear Jan 14 '12 at 18:50
  • It's pretty hard understanding what your variables stand for and what you're trying to achieve here. What is input and expected output ? First off, why are you subtracting 1 from the length ? – Thomas Orozco Jan 14 '12 at 18:51
  • 1
    One issue here is that you're just checking each letter for existence in the candidate word. This is a problem, because think about an anagram like (say) `foobar`. Well, the word `fobaar` is the same length, and has all of the letters as the anagram, but obviously it's wrong because it has too many 'a's and not enough 'o's. – voithos Jan 14 '12 at 18:55
  • I am subtracting 1 from len(word) because, before I did that I would get a list of words 1 shorter than the anagram, I do not know why. *hangs head in shame* – user1149589 Jan 14 '12 at 18:56
  • It is not legal to remove items from a list you are iterating, so `possible.remove(word)` inside of the `for word in possible` loop will not work properly. – Gabe Jan 14 '12 at 18:57
  • @user1149589: Try doing a `print repr(word)` and a `print repr(anagram)` when you're going through the list of words, and visually see if they are the same length, or if the wordlist contains newlines, etc. – voithos Jan 14 '12 at 19:02
  • All is saved! (see above for solutions to all your questions) – user1149589 Jan 14 '12 at 19:13
  • @user1149589: You might be interested in the ".strip()" method instead of replace. And it's a better idea to *append* your updates to the end of your original post rather than to replace it -- now a lot of things people have written are hard to make sense of. – DSM Jan 14 '12 at 19:33
  • @user1149589: Please don't change your question (re-edit to the orignal version), if you find a better answer **different from the ones already given by the other users** you can write your own and accept it. Also remove that `soloved` from the title, if you accept an answer we'll know that this problem is solved. – Rik Poggi Jan 15 '12 at 00:51
  • See also: [Creating an anagram detector (with Python)](https://stackoverflow.com/q/33571749/562769) – Martin Thoma Oct 08 '20 at 09:10

2 Answers2

7

(1) I don't understand the "-1" in "len(word)-1" in your first loop.

(2) Your second loop has several problems:

It doesn't check to see whether the letters are the same, it checks to see whether each letter in the anagram is in the word. You're not using the count information, so you can't distinguish between bok and book. You're also removing from a sequence you're iterating over, which leads to unexpected behaviours.

For my part I'd simply use

sorted_anagram = sorted(anagram)
possibles = [word for word in wordlist if sorted(word) == sorted_anagram]

rather than explicit for loops.

Note that sorting the words is a kind of canonicalization process -- it makes sure that any two words which are anagrams of each other will be in the same format. Another approach to determine whether two things are anagrams would be to make sure that the letter counts are the same:

>>> from collections import Counter
>>> Counter('book')
Counter({'o': 2, 'k': 1, 'b': 1})
>>> Counter('obko')
Counter({'o': 2, 'k': 1, 'b': 1})
>>> Counter('bok')
Counter({'k': 1, 'b': 1, 'o': 1})
>>> 
>>> Counter('book') == Counter('boko')
True
>>> Counter('book') == Counter('bok')
False
DSM
  • 342,061
  • 65
  • 592
  • 494
  • 1
    Smart, comparing the sorted words. Might also want to use `word.lower()` for both the anagram and the words, depending on where he's getting the word dictionary from. – voithos Jan 14 '12 at 18:59
1

As mentioned in the comments the two evils that jump out at me are:

  1. Why do: if len(word) - 1 == len(anagram) ?
  2. Shortening a list while iterating it is a big nono. That line possible.remove(word) should be changed.

What about something like this:

anagramLength = len(anagram) # Get the value once to save CPU
possible1 = [word for word in wordlist if len(word)-1 == anagramLength] # List iteration
possible2 = [] # Make a new list that will be more constricted
for word in possible: #Checking for same letters
    for letter in anagram:
        if letter not in word:
            break
    else:
        possible2.append(word) # Only called if you don't break the above for loop

References to tools used:

  1. listIteration
  2. for..else
matiu
  • 7,469
  • 4
  • 44
  • 48
  • 1
    `len()` is an [O(1) operation](http://stackoverflow.com/questions/699177/python-do-python-lists-keep-a-count-for-len-or-does-it-count-for-each-call). You don't need to save it to a local variable. – voithos Jan 14 '12 at 19:08
  • 2
    @voithos that does not follow at all. O(1) means it's constant time, unrelated to the length of the list. It doesn't mean it's not costly, or that getting from a local variable won't be cheaper. – Daniel Roseman Jan 14 '12 at 19:17
  • I ran a quick test and found having len saved in a var gives an average of 1.2 seconds over 1.7 seconds calculating len every time. (wordList wad 76 entries in my test, and I filtered it 100,000 times) – matiu Jan 14 '12 at 19:56