2

I have text files with a few thousands words in them (one word in a line). I've written a function which take two words (strings), and checks if one word is an Anagram of the other (that means if the two words contains the same letters, even if in different order).

Now I want to go over my huge text file and search for anagrams. My output should be a list which contains tuples of couple of words which are anagrams.

The problem is that I have no idea how to go over the words with a for/while loop. Everything I've tried has failed. (I'm clear with the way of doing it, but I just don't know python well enough).

edit#1: Assuming I want to go over lines 1 to 100 in the text instead of the whole text, how do I do that?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Orr
  • 57
  • 1
  • 8

6 Answers6

2
file = 'file.txt'
with open(file, 'r') as f:
    for line in f:
        pass
PyTony
  • 31
  • 4
0

The Python Tutorial has you covered:

An alternative approach to reading lines is to loop over the file object. This is memory efficient, fast, and leads to simpler code:

for line in f:
    print line,

You can use itertools.combinations to get all combinations of words:

with open("file.txt") as word_list:
    for (word1, word2) in itertools.combinations(word_list, 2):
        if anagram(word1, word2):
            # do stuff
Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • i need to give my function two words (strings). that means i need to give it the word on the current, and the word on the next line, and after it on the third line and such on.. after doing that i need to give my function the word in the second line with the word on the third line, and than the word in the fourth line and such on... i haven't succeed doing it. any ideas ? – Orr Nov 25 '11 at 13:23
0

The readlines gets you a list of all the words in the file:

text = open("myfile.txt")
wordlist = text.readlines()

Now you just have to do the for loop:

for item in wordlist:
    anagramfunction()...
jonathan.hepp
  • 1,603
  • 3
  • 15
  • 21
0
  1. load all words (lines) into list, while words are in separate lines this can be done via readlines() (you will have to use strip() to remove line ends):

    words = [s.strip() for s in f.readlines()]

  2. for each word create anagram

  3. use word list in operator for that anagram to check if anagram exists
  4. if exists then print
Michał Niklas
  • 53,067
  • 18
  • 70
  • 114
0

I assume your list of words is not so huge it does not fit in RAM. Here is a (non-optimized) algorithm that would build the list of anagrams (using bits of previous answers):

def buildAnagramsList(word, wordList):
    anagramsList = []
    for word2 in wordList:
        if areAnagrams(word, word2): #you already have a similar method
            list.remove(word2) # Spare some time here by not looking twice for the same anagrams
            anagramsList.append(word2)
    return anagramsList

file = open("myfile.txt")
words = [s.strip() for s in file.readlines()]
anagramsLists = [buildAnagramsList(word, words) for word in words]
Sébastien
  • 13,831
  • 10
  • 55
  • 70
0

I would have gone for something like this:

wordList = []
anagrams = []

file = StringIO.StringIO(open("file.txt","rb"),dialect=csv.excel) //Using csv.excel as each word is on a different line, so hoping this should work but Im not entirely sure
wordList.extend(wordList)

Wordlist should now be something like [Word1, Word2, Word3]

for i in xrange(wordList):
    if wordList[i] == wordList[i+1]://Code to analyse anagrams here
        anagrams.append(wordList[i])

Im really not sure on this syntax, Im giving you an idea of what I would do. You would have to stop it throwing an OutOfBounds error but I didnt have a lot of time to write it! :P

miken32
  • 42,008
  • 16
  • 111
  • 154
RonnyKnoxville
  • 6,166
  • 10
  • 46
  • 75