1

I'm creating an Anagram Solver in Python 2.7.

The solver takes a user inputted anagram, converts each letter to a list item and then checks the list items against lines of a '.txt' file, appending any words that match the anagram's letters to a possible_words list, ready for printing.

It works... almost!


# Anagram_Solver.py

anagram = list(raw_input("Enter an Anagram: ").lower())

possible_words = []

with file('wordsEn.txt', 'r') as f:

    for line in f:

        if all(x in line + '\n' for x in anagram) and len(line) == len(anagram) + 1:

            line = line.strip()
            possible_words.append(line)

print "\n".join(possible_words)

For anagrams with no duplicate letters it works fine, but for words such as 'hello', the output contains words such as 'helio, whole, holes', etc, as the solver doesn't seem to count the letter 'L' as being 2 separate entries?

What am I doing wrong? I feel like there is a simple solution that I'm missing?

Thanks!

Hexbob6
  • 167
  • 1
  • 2
  • 12
  • Your algorithm is too simplistic. Rather than checking that every character in `anagram` is in `line` (which is not sufficient to identify an anagram), you might check that the counts of each type of character are the same in both strings. – khelwood Aug 31 '16 at 14:16
  • `all(x in line + '\n' for x in anagram)` is the problem. if there is an "l" in the word and 100 of them in the anagram it will still return `True` – Ma0 Aug 31 '16 at 14:16
  • 1
    Might want to look at http://stackoverflow.com/questions/39028548/python-scrabble-challenge-find-valid-words/39028740#39028740 for some ideas... (it's effectively the same problem) – Jon Clements Aug 31 '16 at 14:18
  • @Ev.Kounis , thanks for the heads up. I'm still fairly new to Python and think that I've misunderstood the application of the all() function. I mainly brute forced that line until I got it to work, I guess without fully understanding how it works. If it's not too beyond the scope of this question, are you able to explain exactly what it does? – Hexbob6 Aug 31 '16 at 14:28
  • takes every letter that exists in the anagram (`for x in anagram`) and checks to see if it exists in the line (`x in line`). `all(x,y,z, ..)` is the same as saying `x and y and z and ...` so all letters in the anagram have to exist in the line too but the line cannot contain more of em (`len(line) == len(anagram)`). Also handle the newline character in a different way please. it doesn't have any place in there XD – Ma0 Aug 31 '16 at 14:33
  • @Ev.Kounis noted! ;) And thanks a lot! :) – Hexbob6 Aug 31 '16 at 14:39
  • Try sorting the two words then check of they are equal. – wwii Aug 31 '16 at 16:26

2 Answers2

4

This is probably easiest to solve using a collections.Counter

>>> from collections import Counter
>>> Counter('Hello') == Counter('loleH')
True
>>> Counter('Hello') == Counter('loleHl')
False

The Counter will check that the letters and the number of times that each letter is present are the same.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • 1
    Or just `sorted('Hello') == sorted('loleHl')` as this answer: http://stackoverflow.com/questions/14990725/checking-strings-against-each-other-anagrams – Chris_Rands Aug 31 '16 at 14:18
  • Thanks for the reply @mgilson , I hadn't heard of the Counter class before but it looks like a useful method! – Hexbob6 Aug 31 '16 at 15:03
  • I've chosen the other answer as correct as it goes into slightly more detail which I found helpful, but it was useful to see the different console outputs for the counter module. Also thanks for the response @Chris_Rands , it seems as though the sorted method runs a lot faster than using collections, so I've edited the other answer to include it. – Hexbob6 Aug 31 '16 at 15:06
1

Your code does as it's expected. You haven't actually made it check whether a letter appears twice (or 3+ times), it just checks if 'l' in word twice, which will always be True for all words with at least one l.

One method would be to count the letters of each word. If the letter counts are equal, then it is an anagram. This can be achieved easily with the collections.Counter class:

from collections import Counter
anagram = raw_input("Enter an Anagram: ").lower()

with file('wordsEn.txt', 'r') as f:
    for line in f:
        line = line.strip()
        if Counter(anagram) == Counter(line):
            possible_words.append(line)

print "\n".join(possible_words)

Another method would be to use sorted() function, as suggested by Chris in the other answer's comments. This sorts the letters in both the anagram and line into alphabetical order, and then checks to see if they match. This process runs faster than the collections method.

anagram = raw_input("Enter an Anagram: ").lower()

with file('wordsEn.txt', 'r') as f:
    for line in f:
        line = line.strip()
        if sorted(anagram) == sorted(line):
            possible_words.append(line)

print "\n".join(possible_words)
Hexbob6
  • 167
  • 1
  • 2
  • 12
TerryA
  • 58,805
  • 11
  • 114
  • 143
  • Thanks for the in depth explantion and re-write of my code. I've tested both the counter method and Chris_Rands sorted method from the comment above and found that the sorted method actually processes noticeably faster, so I'll append that in to this answer and mark as correct. Thanks again! – Hexbob6 Aug 31 '16 at 14:49