2

I'm currently writing an anagram program in java that uses a hashmap> to store the words. The dictionary text file I have is in the format:

eelrss lesser
eelrssst restless
eelrsst tressel
eelrsstvy sylvester
eelrst lester
eelrstt letters
eelrstt settler
eelrstt trestle

The letters of each word have been put in alphabetical order next to their word and the whole file sorted in alphabetical order using the keys.What I am trying to do is format it using Python so that duplicate words are on the same line, e.g. in the above example, eelrstt letters eelrstt settler

would become eelrstt letter settler trestle

The code I am using to try and convert this is:

f = open('d13.txt')
lines = f.readlines()
duped = open('checked.txt', 'w')

for i in range(len(lines)):
   line1 = lines[i].split(' ')
   line2 = lines[i+1].split(' ')
   if line1[0] != 'nm':
       if line1[0] == line2[0]:
           line3 = line1 + line2[1:]
           line2[0] = 'nm'
       else:
            line3 = line1
       line4 = ' '.join(line3)
       duped.write(line4)

However this produces a mess that only picks up on the next line and leaves duplicates in.

eelrss lesser
eelrssst restless
eelrsst tressel
eelrsstvy sylvester
eelrst lester
eelrstt letters
 settler
eelrstt settler
 trestle
eelrstt trestle

Can anyone please help?

Ed C
  • 33
  • 3
  • You have forgotten to strip `\n`. Otherwise the output is correct for the given input. – santosh-patil Mar 13 '14 at 11:19
  • Adding strip \n gives a really weird output with multiple outputs on the same line. Do you have any suggestions for choose that would give the desired output please? – Ed C Mar 13 '14 at 13:55
  • That weird output is because you have chosen to strip all of the `\n`. Change `line4 = ' '.join(line3)` to `line4 = ' '.join(line3)+'\n'`, and you will get it bettered. – santosh-patil Mar 13 '14 at 14:11
  • I've tried doing what you suggested, but am now getting the following. `eelrrvy revelry eelrss lesser eelrssst restless eelrsst tressel eelrsstvy sylvester eelrst lester eelrstt letters settler eelrstt settler trestle eelrstt trestle eelrstw swelter wrestle` – Ed C Mar 13 '14 at 14:54
  • I have tried to run the modified code on [ideaone](https://ideone.com/55GdlU) and I am getting the correct output. Please look into the link. – santosh-patil Mar 13 '14 at 16:00
  • Okay, realised I need to run it multiple times to remove duplicate entries. Thanks so much for your help and putting up with me being thick. – Ed C Mar 13 '14 at 17:30
  • possible duplicate of [Python regular expression matching a multiline block of text](http://stackoverflow.com/questions/587345/python-regular-expression-matching-a-multiline-block-of-text) – Paul Sweatte Apr 10 '14 at 15:41

0 Answers0